Genius - introduction

Similar documents
Genius - introduction

VSC Users Day 2018 Start to GPU Ehsan Moravveji

Genius Quick Start Guide

UAntwerpen, 24 June 2016

Cerebro Quick Start Guide

Transitioning to Leibniz and CentOS 7

How to run applications on Aziz supercomputer. Mohammad Rafi System Administrator Fujitsu Technology Solutions

Before We Start. Sign in hpcxx account slips Windows Users: Download PuTTY. Google PuTTY First result Save putty.exe to Desktop

n N c CIni.o ewsrg.au

Introduction to PICO Parallel & Production Enviroment

Introduc)on to Hyades

HPC Middle East. KFUPM HPC Workshop April Mohamed Mekias HPC Solutions Consultant. Agenda

Illinois Proposal Considerations Greg Bauer

Introduction to GALILEO

INTRODUCTION TO THE CLUSTER

Our new HPC-Cluster An overview

Introduction to GALILEO

Introduction to CINECA HPC Environment

TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 13 th CALL (T ier-0)

Leibniz Supercomputer Centre. Movie on YouTube

Introduction to High Performance Computing. Shaohao Chen Research Computing Services (RCS) Boston University

LBRN - HPC systems : CCT, LSU

HPC DOCUMENTATION. 3. Node Names and IP addresses:- Node details with respect to their individual IP addresses are given below:-

Minnesota Supercomputing Institute Regents of the University of Minnesota. All rights reserved.

GPU Computing with Fornax. Dr. Christopher Harris

IBM Deep Learning Solutions

The Why and How of HPC-Cloud Hybrids with OpenStack

Introduction to HPC2N

PACE. Instructional Cluster Environment (ICE) Orientation. Research Scientist, PACE

Habanero Operating Committee. January

GPUs and Emerging Architectures

TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 16 th CALL (T ier-0)

TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 11th CALL (T ier-0)

Introduction to High-Performance Computing (HPC)

PACE. Instructional Cluster Environment (ICE) Orientation. Mehmet (Memo) Belgin, PhD Research Scientist, PACE

Using the IBM Opteron 1350 at OSC. October 19-20, 2010

IT4Innovations national supercomputing center. Branislav Jansík

Introduction to the NCAR HPC Systems. 25 May 2018 Consulting Services Group Brian Vanderwende

HOKUSAI System. Figure 0-1 System diagram

GPU computing at RZG overview & some early performance results. Markus Rampp

Hybrid KAUST Many Cores and OpenACC. Alain Clo - KAUST Research Computing Saber Feki KAUST Supercomputing Lab Florent Lebeau - CAPS

Übung zur Vorlesung Architektur paralleler Rechnersysteme

Smarter Clusters from the Supercomputer Experts

Pedraforca: a First ARM + GPU Cluster for HPC

Guillimin HPC Users Meeting February 11, McGill University / Calcul Québec / Compute Canada Montréal, QC Canada

Advanced Research Computing. ARC3 and GPUs. Mark Dixon

Comet Virtualization Code & Design Sprint

OBTAINING AN ACCOUNT:

RECENT TRENDS IN GPU ARCHITECTURES. Perspectives of GPU computing in Science, 26 th Sept 2016

SuperMike-II Launch Workshop. System Overview and Allocations

The RWTH Compute Cluster Environment

Technologies and application performance. Marc Mendez-Bermond HPC Solutions Expert - Dell Technologies September 2017

New User Seminar: Part 2 (best practices)

The cluster system. Introduction 22th February Jan Saalbach Scientific Computing Group

RWTH GPU-Cluster. Sandra Wienke March Rechen- und Kommunikationszentrum (RZ) Fotos: Christian Iwainsky

The GPU-Cluster. Sandra Wienke Rechen- und Kommunikationszentrum (RZ) Fotos: Christian Iwainsky

Introduction to GALILEO

HPC Hardware Overview

TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 6 th CALL (Tier-0)

NCAR Globally Accessible Data Environment (GLADE) Updated: 15 Feb 2017

PACE Orientation. Research Scientist, PACE

HPC Architectures. Types of resource currently in use

Knights Landing production environment on MARCONI

Code optimization. Geert Jan Bex

CS500 SMARTER CLUSTER SUPERCOMPUTERS

Center for Research Informatics

Basic Specification of Oakforest-PACS

NAMD Performance Benchmark and Profiling. January 2015

Introduction to NCAR HPC. 25 May 2017 Consulting Services Group Brian Vanderwende

ARCHER Champions 2 workshop

NVIDIA GPU TECHNOLOGY UPDATE

KISTI TACHYON2 SYSTEM Quick User Guide

Tesla GPU Computing A Revolution in High Performance Computing

INTRODUCTION TO THE ARCHER KNIGHTS LANDING CLUSTER. Adrian

Gateways to Discovery: Cyberinfrastructure for the Long Tail of Science

Supercomputing environment TMA4280 Introduction to Supercomputing

User Training Cray XC40 IITM, Pune

DELIVERABLE D5.5 Report on ICARUS visualization cluster installation. John BIDDISCOMBE (CSCS) Jerome SOUMAGNE (CSCS)

PRACE Project Access Technical Guidelines - 19 th Call for Proposals

HPC-CINECA infrastructure: The New Marconi System. HPC methods for Computational Fluid Dynamics and Astrophysics Giorgio Amati,

High Performance Computing with Accelerators

Introduction to ARC Systems

Accelerating High Performance Computing.

DATARMOR: Comment s'y préparer? Tina Odaka

INTRODUCTION TO THE ARCHER KNIGHTS LANDING CLUSTER. Adrian

Tech Computer Center Documentation

PODShell: Simplifying HPC in the Cloud Workflow

Introduction to High-Performance Computing (HPC)

7 DAYS AND 8 NIGHTS WITH THE CARMA DEV KIT

Resources Current and Future Systems. Timothy H. Kaiser, Ph.D.

User Guide of High Performance Computing Cluster in School of Physics

Introduction to HPC2N

Introduction to CINECA Computer Environment

PART-I (B) (TECHNICAL SPECIFICATIONS & COMPLIANCE SHEET) Supply and installation of High Performance Computing System

Introduction to Unix Environment: modules, job scripts, PBS. N. Spallanzani (CINECA)

HPC Resources at Lehigh. Steve Anthony March 22, 2012

Introduction to HPC Using zcluster at GACRC

RHRK-Seminar. High Performance Computing with the Cluster Elwetritsch - II. Course instructor : Dr. Josef Schüle, RHRK

XSEDE New User Tutorial

Introduction to High Performance Computing at UEA. Chris Collins Head of Research and Specialist Computing ITCS

Transcription:

Genius - introduction HPC team ICTS, Leuven 5th June 2018

VSC HPC environment GENIUS 2

IB QDR IB QDR IB EDR IB EDR Ethernet IB IB QDR IB FDR Numalink6 /IB FDR IB IB EDR ThinKing Cerebro Accelerators Genius (2018) 176+32 nodes 4160 cores 2x Intel Ivy Bridge 10 cores 64 GB RAM 128 GB RAM 48+96 nodes 3456 cores 2x Intel Haswell 12 cores 64 GB RAM 128 GB RAM 1 nodes 480 cores 48x Intel Ivy Bridge 10 cores 12 TB RAM 20 TB scratch 1 nodes 160 cores 16x Intel Ivy Bridge 10 cores 2 TB RAM 8 nodes 2xNVIDIA Tesla K20X 2688 GPGPU cores 6 GB RAM 5 nodes 2xNVIDIA Tesla K40 2880 GPGPU cores 12 GB RAM 8 nodes Intel Xeon Phi 5110P 120 CoCPU cores 8 GB RAM 86+10 nodes 3456 cores 2x Intel Skylake 18 cores 192 GB RAM 768 GB RAM 20 nodes 720 cores 2x Intel Skylake 18 cores 4 x NVIDIA P100 Belnet KU Leuven 8 nodes 160 cores 2x Intel Ivy Bridge 10 cores 64 GB RAM 2 nodes Haswell 2xNVIDIA Quadro K5200 8 GB RAM 64 GB RAM 2 nodes 72 cores 2x Intel Skylake 18 cores 384 GB RAM 2 nodes 72 cores 2x Intel Skylake 18 cores 384 GB RAM 1 x NVIDIA P600 NAS 70 TB HOME DATA GPFS Scratch DDN 1,2 PB GPFS Archive DDN 600 TB Login nodes Visualisation 3

Genius Overview GPU nodes distributed over 3 racks r22g35..41 r23g34..39 r24g35..41 24 nodes per chassis/enclosure Compute nodes Large Memory nodes r22i13n01..24 r22i27n01..24 r23i13n01..24 r23i27n01..24 r22 r23 r24 4 ICTS

Genius overview Type of node CPU type Interconnect # cores installed mem local discs # nodes skylake Xeon 6140 IB-EDR 36 192 GB 800 GB 86 skylake large mem skylake GPU Xeon 6140 IB-EDR 36 768 GB 800 GB 10 Xeon 6140 4xP100 SXM2 IB-EDR 36 192 GB 800 GB 20 5

System comparison Tier 2 ThinKing Cluster Genius (2018) Total nodes 176 / 32 48 / 96 86 / 10 Processor type Ivybridge Haswell Sky Lake Base Clock Speed 2.8 GHz 2.5 GHz 2.3 GHz Cores per node 20 24 36 Total cores 4,160 3,456 3,456 Memory per node (GB) 64 / 128 64 / 128 192 / 768 Memory per core (GB) 3.2 / 6.4 2.7 / 5.3 5.3 / 21.3 Peak performance (Flops/cycle) 4 DP FLOPs/cycle: 4-wide AVX addition OR 4-wide AVX multiplication 8 DP FLOPs/cycle: 4-wide FMA (fused multiply-add) instructions AVX2 16 DP FLOPs/cycle: 8-wide FMA (fused multiply-add) instructions AVX-512 Network Infiniband QDR 2:1 Infiniband FDR Infiniband EDR 6 Cache ( KB/ KB/L3 MB) 10x(32i+32d) / 10x256 / 25 MB 12x(32i+32d) / 12x256 / 30MB 18x(32i+32d) / 18x1024 / 25 MB

Skylake compute node 7 DDR4 DDR4 DDR4 DDR4 DDR4 DDR4 DDR4 DDR4 QPI QPI core0 Socket 0 Numa node 0 Socket 1 Numa node 1 IB I/O L3 core0 core1 core2 core3 core4 core5 core6 core7 core8 core9 core10 core11 core12 core13 core14 core15 core16 core17 L3 core18 core19 core20 core21 core22 core23 core24 core25 core26 core27 core28 core29 core30 core31 core32 core33 core34 core35

GPU comparison K20Xm K40c P100 (2018) Total number of nodes 8 5 20 GPUs per node 2 2 4 Total CUDA cores 2688 2880 3584 Memory 6GB 12GB 16 GB Base Clock Speed cores 732MHz 745MHz 1328 MHz Max clock speed cores 784MHz 874MHz 1480 MHz Memory Bandwidth 249,6GB/s 288GB/s 732 GB/s Peak double precision floating point performance 1,31Tflops 1,43Tflops 5,3 Tflops Peak single precision floating point performance 3,95Tflops 4,29Tflops 10,6 Tflops Features SMX, Dynamic Parallelism, Hyper-Q, GPUBoost SMX, Dynamic Parallelism, Hyper-Q, GPUboost NVLink, GPUBoost 8

Production phase Viewpoint MOAB/Torque New MOAB/Torque MAM Pilot phase MAM ThinKing Cerebro Genius 176+32 nodes 4160 cores 2x Intel Ivy Bridge 10 cores 48+96 nodes 3456 cores 2x Intel Haswell 12 cores c 1 nodes 480 cores 48x Intel Ivy Bridge 10 cores 1 nodes 160 cores 16x Intel Ivy Bridge 10 cores 86 +10 nodes 3,456 cores 2x Intel Skylake 18 cores 20 nodes 720 cores 2x Intel Skylake 18 cores 64 GB RAM 128 GB RAM 64 GB RAM 128 GB RAM 12 TB RAM 20 TB scratch 2 TB RAM 192GB RAM 768GB RAM 4 x NVIDIA P100 GPFS DDN 14K 9 Login nodes

Login nodes ssh vsc3xxxx@login-node-name login nodes - different purpose, different limits 2 login nodes (different from ThinKing login nodes) login1-tier2.hpc.kuleuven.be login2-tier2.hpc.kuleuven.be nx1 nx2 GUI login to Thinking, terminal to Genius, Use to open Viewpoint Basic command line login 2 login nodes with a visualization capabilities (nvidia Quadro P6000 GPU) login3-tier2.hpc.kuleuven.be Basic login4-tier2.hpc.kuleuven.be command line login + GPU rendering 2 nx nodes, access through server 10

Storage areas (same as on ThinKing) Name Variable Type Access Backup Quota /user/leuven/30x/vsc30xxx $VSC_HOME NFS Global YES 3 GB /data/leuven/30x/vsc30xxx $VSC_DATA NFS Global YES 75 GB /scratch/leuven/30x/vsc30xxx $VSC_SCRATCH $VSC_SCRATCH_SITE GPFS Global NO 100 GB /node_scratch (ThinKing) $VSC_SCRATCH_NODE ext4 Local NO 100-250 GB /node_scratch (Cerebro) $VSC_SCRATCH_NODE xfs Local NO 10 TB /node_scratch (Genius) $VSC_SCRATCH_NODE ext4 Local NO 100GB /staging/leuven/stg_xxxxx n/a GPFS Global NO Minimum 1TB /archive/leuven/arc_xxxxx n/a Object Global NO (Mirror) /mnt/beeond/ (Genius) $VSC_SCRATCH_JOB BeeGFS Nodes in the job NO Minimum 1TB 300GB To check available space: $ quota s ($VSC_HOME and $VSC_DATA) $ mmlsquota vol_ddn2:leuven_scratch --block-size auto ($VSC_SCRATCH) 11

Available GPUs at KU Leuven/UHasselt Tesla K20 Tesla K40 Pascal P100 SP cores 14x192=2,688 15x192=2880 56x64=3584 DP cores 14x64=896 15x64=960 56x32=1792 Clock freq. (MHz) 732 745 1481 DRAM (GB) 5.7 11.2 16 DRAM freq. (GHz) 2.6 (384-bit) 3.0 (384-bit) Compute capability 3.5 3.5 6.0 cache (MB) 1.5 1.5 4.0 Constant mem. (KB) 64 64 64 Shared mem. per block (KB) 48 48 48 Registers per block (x1024) 64 64 64 0.71 (4096-bit) 12

PCI-e PCI-e Peer-to-Peer Bandwidth High Med Low GPU 0 GPU 1 GPU 3 GPU 2 Dev. ID 0 1 2 3 0 509 10 19 18 1 10 508 18 18 2 19 18 508 10 3 18 18 10 507 Bi-directional P2P: PCIe Dev. ID 0 1 2 3 0 508 37 37 61 1 37 507 61 37 2 36 61 508 37 3 62 37 37 506 Bi-directional P2P: NVLink (P100@Leuven) 13

How to Start-to-GPU? Approach 1: Users Does your software already use GPUs? Check Nvidia Application Catalog: (https://www.nvidia.com/en-us/data-center/gpu-acceleratedapplications/catalog/) Machine Learning: Tensorflow, Keras, PyTorch, CAFFE2, Chemistry: Abinit, BigDFT, CP2K, Gaussian, QuantumEspresso, BEAGLE-lib, VASP, Phys. & Eng.: OpenFOAM, Fluent, COSMO, Biophysics: NAMD, CHARM, GROMACS, Tools: Alinea-Forge, Cmake, MAGMA, 14

How to Start-to-GPU? Approach 2: Porting Incrementally porting your code to use GPUs! Check Nvidia Libraries: https://developer.nvidia.com/gpu-accelerated-libraries cublas cufft cusparse curand THRUST Replace function calls in your application with one from the CUDA libraries. E.g. SGEMM( ) -> cublassgemm( ) (Image taken from Nvidia CUDA 9.2 Libraries) 15

Low-level APIs High-level APIs How to Start-to-GPU? Approach 3: Developer Tailor your software development to the GPU hardware! Python: Numba, Numbapro, pycuda, Quasar Matlab: Overloaded functions and gpuarrays R: rcuda, rpud Language Directives Programming Model OpenACC CUDA CUF Kernels CUDA (C/C++/Fortran) OpenCL 16

Torque/Moab Jobs have to be submitted from new (Genius) login nodes Some commands: $ qsub : Submit a job, returns a job ID $ qsub test.sh 50001435.tier2-p-moab-2.icts.hpc.kuleuven.be $ qdel <job-id> : Delete a queued or running job $ qdel 50001435 $ qsub -A lpt2_pilot_2018 : credits during pilot phase Later: project with A (even default_project for introductory credits) will be required CPU nodes: SINGLE user policy (only 1 user per node), Single core jobs can end up on the same node, but are accounted on a job basis. GPU nodes: SHARED user policy MULTIPLE users per node is allowed. 17

Moab Allocation Manager 0.000278 walltime nodes ftype # credits # 1/3600 Project credits valid for all Tier-2 clusters: ThinKing, Cerebro, GPU, Genius after the pilot phase f type = 4.76 6.68 2.86 3.45 10 20 ThinKing IvyBridge ThinKing Haswell ThinKing GPU Cerebro Genius CPU Genius GPU (full node 4xP100) Example: -l nodes=1:ppn=1,walltime=1:00:00 #credits = (0.000278 3600 1) 10 = 10 -l nodes=1:ppn=36,walltime=1:00:00 #credits = (0.000278 3600 1) 10 = 10 18

BeeOND BeeOND("BeeGFS On Demand") was developed to enable easy creation of one or multiple BeeGFS instances on the fly. BeeOND is typically used to aggregate the performance and capacity of internal SSDs or hard disks in compute nodes for the duration of a compute job. This provides additional performance and a very elegant way of burst buffering. Temporary fast storage (during the job execution) Dedicated to the user (not shared), SSDs fast for I/O operations schedule a job with a BeeOND FS $ qsub -lnodes=2:ppn=36:beeond 19

Single island The compute nodes are bundled into several domains (islands). Within one island, the network topology is a 'fat tree' topology for highly efficient communication. The connection between the islands is much weaker. Choice to request running a job in one island (max number of nodes=24) $ qsub l nodes=24:ppn=36:singleisland 20

Queues The current available queues on Genius are: q1h, q24h, q72h and q7d. There will be no 21 day queue during the pilot phase. As before, we strongly recommend that instead of specifying queue names on the batch scripts you use the PBS l option to define your needs. Some useful are l options for resources usage: -l walltime=4:30:00 (job will last 4h 30 min) -l nodes=2:ppn=36 (job needs 2 nodes and 36 cores per node) -l pmem=5gb (job request 5 GB of memory per core, which is the default for the thin node) 21

Extra submission options GPUs: $ qsub l nodes=1:ppn=1:gpus=1 l partition=gpu $ qsub l nodes=1:ppn=36:gpus=4 l partition=gpu Large memory nodes: $ qsub -l partition=bigmem Debugging nodes: $ qsub -l qos=debugging l partition=gpu qsub l nodes=1:ppn=36 -l walltime=30:00 \ -l qos=debugging -l partition=gpu -A lpt2_pilot_2018 \ myprogram.pbs 22

Credits (after pilot phase) Credits card concept: Preauthorization: holding the balance as unavailable until the merchant clears the transaction Balance to be held as unavailable: based on requested resourced (walltime, nodes) Actual charge based on what was really used: used walltime (you pay only what you use, e.g. when job crashes) See output file: How to check available credits? (no module for accounting) $ mam-balance Resource List: neednodes=2:ppn=6,nodes=2:ppn=6,pmem=1gb,walltime=01:00:00 Resources Used: cput=00:00:00,mem=0kb,vmem=0kb,walltime=00:00:02 23

Viewpoint portal Ease-of-Use Job Submission and Management Viewpoint is a rich, easy-to-use portal for end-users and administrators, designed to increase productivity through its visual web-based interface, powerful job management features, and other workload functions. Allows to speed the submission process and reduce errors by automating best practices. Expands an HPC user base to include even non-it skilled users. Helps to gain admin insight into workload and resource utilization for better management and troubleshooting. 24

Viewpoint portal Who should use it? Researchers that like GUI or are not very familiar with Linux command Line Researchers that work in NX environment Group administrators that can create templates/workflows for the whole group Group members that share the data Researchers for whom template exists (but defining own templates is also possible) 25

Viewpoint portal Interested? Contact us for the initial login procedure Setup : Access from ThinKing NX (Firefox) Later to be moved outside HPC http://tier2-p-viewpoint-1.icts.hpc.kuleuven.be:8081 26

Viewpoint portal 27

Viewpoint portal 28

Viewpoint portal 29

Viewpoint portal 30

Viewpoint portal - Home 31

Viewpoint portal - Workload 32

Viewpoint portal - Templates Contact us for help 33

Viewpoint portal File Manager 34

Viewpoint portal - Home 35

Viewpoint portal Create Job 36

Viewpoint portal R with Worker 37

Viewpoint portal Free form 38

Software Operating system CentOS 7.4.1708, 64 bit Kernel 3.10.0-693.17.1.el7.x86_64 Applications For development Compilers & basic libraries tool chains Libraries Tools: debuggers, profilers Use modules Different from ThinKing modules! 39

Available tool chains intel tool chain Name intel foss version 2018a 2018a Compilers Intel compilers (v 2018.1.163) icc, icpc, ifort foss tool chain MPI Library Intel MPI OpenMPI GNU compilers (v 6.4.0-2.28) gcc, g++, gfortran Math libraries Intel MKL OpenBLAS, LAPACK FFTW ScaLAPACK 40

Software Mostly used software is installed. Own builds need to be rebuild for Genius. If missing please contact us! 41

Software By default 2018a software is listed ($ module available) The modules software manager is now Lmod. Lmod is a Lua based module system, but it is fully compatible with the TCL modulefiles we ve used in the past. All the module commands that you are used to will work. But Lmod is somewhat faster and adds a few additional features on top of the old implementation. To (re)compile ask for interactive job Default module at the time of loading. Subjest to changes. 42

Modules $ module available or module av R Lists all installed software packages $ module av & grep -i python To show only the modules that have the string 'python' in their name, regardless of the case $ module load foss Adds the matlab command in your PATH $ $ module list Lists all loaded modules in current session $ module unload R/3.4.4-intel-2018a-X11-20180131 Removes all only the selected module, other loaded modules dependencies are still loaded $ module purge Removes all loaded modules from your environment 43

Modules $ module swap foss intel = module unload foss; module load intel $ module try-load packagexyz try to load a module with no error message if it does not exist $ module keyword word1 word2... Keyword searching tool, searches any help message or whatis description for the word(s) given on the command line $ module help foss Prints help message from modulefile $ module spider foss Describes the module 44

Modules ml convenient tool $ ml = module list $ ml foss =module load foss $ ml -foss =module unload foss (not purge!) $ ml show foss Info about the module Possible to create user collections: module save <collection-name> module restore <collection-name> module describe <collection-name> module savelist module disable <collection-name> More info: http://lmod.readthedocs.io/en/latest/010_user.html 45

Questions Now Helpdesk: hpcinfo@icts.kuleuven.be or https://admin.kuleuven.be/icts/hpcinfo_form/hpc-info-formulier VSC web site: http://www.vscentrum.be/ VSC documentation Genius Quick Start Guide: https://www.vscentrum.be/assets/1355 Slides from the session available under session webpage VSC agenda: training sessions, events Systems status page: http://status.kuleuven.be/hpc or https://www.vscentrum.be/en/user-portal/system-status 46