RWTH GPU-Cluster. Sandra Wienke March Rechen- und Kommunikationszentrum (RZ) Fotos: Christian Iwainsky

Similar documents
The GPU-Cluster. Sandra Wienke Rechen- und Kommunikationszentrum (RZ) Fotos: Christian Iwainsky

The RWTH Compute Cluster Environment

aixcelerate 2016 Paul Kapinos

HPC Middle East. KFUPM HPC Workshop April Mohamed Mekias HPC Solutions Consultant. Agenda

Laohu cluster user manual. Li Changhua National Astronomical Observatory, Chinese Academy of Sciences 2011/12/26

Introduction Workshop 11th 12th November 2013

Before We Start. Sign in hpcxx account slips Windows Users: Download PuTTY. Google PuTTY First result Save putty.exe to Desktop

NVIDIA GPU Computing Séminaire Calcul Hybride Aristote 25 Mars 2010

Advanced Research Computing. ARC3 and GPUs. Mark Dixon

Windows-HPC Environment at RWTH Aachen University

How to run applications on Aziz supercomputer. Mohammad Rafi System Administrator Fujitsu Technology Solutions

Accelerators in Technical Computing: Is it Worth the Pain?

LSF at SLAC. Using the SIMES Batch Cluster. Neal Adams. Stanford Linear Accelerator Center

General Purpose GPU Computing in Partial Wave Analysis

GPU Computing with NVIDIA s new Kepler Architecture

Parallel Computer Architecture - Basics -

RHRK-Seminar. High Performance Computing with the Cluster Elwetritsch - II. Course instructor : Dr. Josef Schüle, RHRK

Headline in Arial Bold 30pt. Visualisation using the Grid Jeff Adie Principal Systems Engineer, SAPK July 2008

n N c CIni.o ewsrg.au

Parallel Programming and Debugging with CUDA C. Geoff Gerfin Sr. System Software Engineer

PPCES 2016: MPI Lab March 2016 Hristo Iliev, Portions thanks to: Christian Iwainsky, Sandra Wienke

Performance Tools for Technical Computing

Introduction to NCAR HPC. 25 May 2017 Consulting Services Group Brian Vanderwende

Name Department/Research Area Have you used the Linux command line?

Tesla GPU Computing A Revolution in High Performance Computing

GPU Computing with Fornax. Dr. Christopher Harris

GPUs and Emerging Architectures

Bright Cluster Manager Advanced HPC cluster management made easy. Martijn de Vries CTO Bright Computing

STARTING THE DDT DEBUGGER ON MIO, AUN, & MC2. (Mouse over to the left to see thumbnails of all of the slides)

ANSYS Improvements to Engineering Productivity with HPC and GPU-Accelerated Simulation

Introduction to High Performance Computing at UEA. Chris Collins Head of Research and Specialist Computing ITCS

GPU GPU CPU. Raymond Namyst 3 Samuel Thibault 3 Olivier Aumage 3

HPC on Windows. Visual Studio 2010 and ISV Software

The rcuda middleware and applications

Using the computational resources at the GACRC

The Cray CX1 puts massive power and flexibility right where you need it in your workgroup

High Performance Computing with Accelerators

Parallel Computer Architecture - Basics -

Evaluation of Asynchronous Offloading Capabilities of Accelerator Programming Models for Multiple Devices

Introduc)on to Hyades

GPU Cluster Computing. Advanced Computing Center for Research and Education

Illinois Proposal Considerations Greg Bauer

Minnesota Supercomputing Institute Regents of the University of Minnesota. All rights reserved.

IBM Power Systems HPC Cluster

Introduction to GPU hardware and to CUDA

Parallel Computing at DESY Zeuthen. Introduction to Parallel Computing at DESY Zeuthen and the new cluster machines

Pedraforca: a First ARM + GPU Cluster for HPC

Advanced Topics in High Performance Scientific Computing [MA5327] Exercise 1

Vectorisation and Portable Programming using OpenCL

Introduction to High-Performance Computing (HPC)

COSC 6374 Parallel Computation. Debugging MPI applications. Edgar Gabriel. Spring 2008

Our Workshop Environment

BSC Tools Hands-On. Judit Giménez, Lau Mercadal Barcelona Supercomputing Center

MIGRATING TO THE SHARED COMPUTING CLUSTER (SCC) SCV Staff Boston University Scientific Computing and Visualization

Hybrid KAUST Many Cores and OpenACC. Alain Clo - KAUST Research Computing Saber Feki KAUST Supercomputing Lab Florent Lebeau - CAPS

Guillimin HPC Users Meeting July 14, 2016

GRID SOFTWARE. DU _v4.6 January User Guide

SuperMike-II Launch Workshop. System Overview and Allocations

Our Workshop Environment

Supercomputing environment TMA4280 Introduction to Supercomputing

Introduction to the SHARCNET Environment May-25 Pre-(summer)school webinar Speaker: Alex Razoumov University of Ontario Institute of Technology

Effective Use of CCV Resources

Introduction to PICO Parallel & Production Enviroment

Improving the Productivity of Scalable Application Development with TotalView May 18th, 2010

Introduction to High Performance Computing at UEA. Chris Collins Head of Research and Specialist Computing ITCS

NUMA-aware OpenMP Programming

Designed for Maximum Accelerator Performance

Genius Quick Start Guide

Introduction to HPC Using zcluster at GACRC

Introduction to CINECA HPC Environment

Available Resources Considerate Usage Summary. Scientific Computing Resources and Current Fair Usage. Quentin CAUDRON Ellak SOMFAI Stefan GROSSKINSKY

Accelerator programming with OpenACC

Introduction to GPGPUs

Our new HPC-Cluster An overview

Guillimin HPC Users Meeting March 17, 2016

DATARMOR: Comment s'y préparer? Tina Odaka

Tesla GPU Computing A Revolution in High Performance Computing

Getting started with the CEES Grid

SCALABLE HYBRID PROTOTYPE

Our Workshop Environment

Addressing the Increasing Challenges of Debugging on Accelerated HPC Systems. Ed Hinkel Senior Sales Engineer

HPC-CINECA infrastructure: The New Marconi System. HPC methods for Computational Fluid Dynamics and Astrophysics Giorgio Amati,

Practical: a sample code

Answers to Federal Reserve Questions. Training for University of Richmond

GPU Clusters for High- Performance Computing Jeremy Enos Innovative Systems Laboratory

Allinea Unified Environment

GPU Cluster Usage Tutorial

HPC Aaditya User Policies & Support

VIRTUAL GPU SOFTWARE. DU _v5.0 through 5.2 Revision 05 March User Guide

DELIVERABLE D5.5 Report on ICARUS visualization cluster installation. John BIDDISCOMBE (CSCS) Jerome SOUMAGNE (CSCS)

High Performance Computing (HPC) Using zcluster at GACRC

Duke Compute Cluster Workshop. 3/28/2018 Tom Milledge rc.duke.edu

BRC HPC Services/Savio

Parallel Debugging with TotalView BSC-CNS

Exercises: Abel/Colossus and SLURM

Introduction to HPC Using zcluster at GACRC

Stan Posey, CAE Industry Development NVIDIA, Santa Clara, CA, USA

An Introduction to the SPEC High Performance Group and their Benchmark Suites

Batch Systems. Running calculations on HPC resources

Übung zur Vorlesung Architektur paralleler Rechnersysteme

Transcription:

RWTH GPU-Cluster Fotos: Christian Iwainsky Sandra Wienke wienke@rz.rwth-aachen.de March 2012 Rechen- und Kommunikationszentrum (RZ)

The GPU-Cluster GPU-Cluster: 57 Nvidia Quadro 6000 (29 nodes) innovative computer architecture Reasonable usage of resources Daytime: CAVE (VR): 25 nodes Interactive software development (HPC): 4 nodes Nighttime: CAVE, VR, RWTH Aachen, since 2004 Processing of GPGPU compute jobs (HPC) Slide 2

GPU-Cluster: Hardware stack 4 dialogue nodes 24 rendering nodes 1 head node Name linuxgpud[1-4] linuxgpus[01-24] linuxgpum1 Devices # 2 1 details/gpu NVIDIA Quadro 6000 (Fermi) 448 cores 1.15 GHz 6 GB RAM ECC on max. GFlops: 1030.4 (SP), 515.2 (DP) Host processor 2 x Intel Xeon X5650 EP (Westmere) (12-core CPU) @ 2.67GHz Network RAM 24 GB 48 GB QDR InfiniBand Slide 3

GPU-Cluster: Software stack Scientific Linux 6.1 Modules (as on new compute cluster) CUDA Toolkit: 4.0 (3.2) CUDA OpenCL (1.0) PGI Compiler PGI Accelerator Model CUDA Fortran module load cuda directory: $CUDA_ROOT module load pgi (module switch intel pgi) Intel OpenCL SDK: OpenCL (1.1) for Intel CPUs CUDA Debugging TotalView DDT module load totalview module load ddt Slide 4

How to use? Innovative computer architectures No real production mode (e.g. group membership needed) As stable and reliable as possible Mandatory membership in group gpu E-mail to servicedesk@rz.rwth-aachen.de Short description of your (GPU) application (or your purposes) Programming paradigm (e.g. CUDA, OpenCL, ) Single or multi GPU usage Access to GPU-Cluster (+ single GPU machines) Access to GPGPU-Wiki (full documentation) Demo Slide 5

How to use? Interactive mode Short runs/tests only, debugging 1 dialogue node (linuxgpud1): 24/7 2 dialogue nodes (linuxgpud[2,3]): Mon Fri, 8am 8pm Batch mode No interaction, commands are queued + scheduled For performance tests, long runs 24+1 rendering nodes 2 dialogue nodes Mon Fri, 8pm 8am; Sat + Son, whole day 1 dialogue node (linuxgpud4): Mon Fri, 8am 8pm for short test runs during daytime Note: reboot at switch from interactive to batch mode Configuration might change Slide 6

How to use: Interactive mode Login with your TIM-account on dialogue nodes: linuxgpud[1-3] GPUs are set to exclusive mode (per process) Only one person can access GPU If occupied, e.g. message all CUDA-capable devices are busy or unavailable If not set a certain device in (CUDA) code, automatically scheduled to other GPU within node (if available) Debugging (default) TotalView and DDT support CUDA Toolkit 3.2 TotalView 8.9.2 (!) supports CUDA Toolkit 4.0, but currently not working (due to NVIDIA driver) but cuda-gdb should work Be aware: debugger run usually on GPU with ID 0 (fails if GPU is occupied) Nodes with special X-Configuration for debugging: linuxgpud[1,2] Slide 7

How to use: Interactive mode See what is running: nvidia-smi linuxgpud1$> nvidia-smi Mon Oct 17 12:41:01 2011 +------------------------------------------------------+ NVIDIA-SMI 2.285.05 Driver Version: 285.05.09 -------------------------------+----------------------+----------------------+ GPU ID + type display mode Nb. Name Bus Id Disp. Volatile ECC SB / DB Fan Temp Power Usage /Cap Memory Usage GPU Util. Compute M. ===============================+======================+====================== 0. Quadro 6000 0000:02:00.0 Off 0 0 30% 80 C P0 Off / Off 4% 208MB / 5375MB 99% E. Process -------------------------------+----------------------+---------------------- 1. Quadro 6000 0000:85:00.0 On 0 0 36% 84 C P8 Off / Off 0% 22MB / 5375MB 0% E. Process -------------------------------+----------------------+---------------------- Compute processes: GPU Memory GPU PID Process name Usage ============================================================================= process running on GPU nvidia-smi q Lists GPU details ECC (SB: single bit, DB: double bit) compute mode: 1 person (1 process) 0. 30234 nbody 196MB +-----------------------------------------------------------------------------+ Slide 8

How to use: Batch mode Create batch compute job for LSF Select appropriate queue to get scheduled on GPU-cluster q gpu Exclusive nodes Nodes are allocated exclusively at least 2 GPUs for one job Please use resources reasonably! Submit your job bsub < mygpuscript.sh Starts running as soon as: batch mode starts and job is scheduled Display pending reason: bjobs p During daytime: Dispatch windows closed More documentation Reminder: Only one node in batch mode on daytime (for testing): -a gpu (instead of -g gpu) (-q is given priority to -a) RWTH Compute Cluster User s Guide: http://www.rz.rwth-aachen.de/hpc/primer Unix-Cluster Documentation: https://wiki2.rz.rwth-aachen.de/ display/bedoku/usage+of+the+linux+rwth+compute+cluster Slide 9

Batch script for single GPU (node) usage #!/usr/bin/env zsh ### Job name #BSUB -J GPUTest-Cuda ### File / path where STDOUT & STDERR will be written to #BSUB -o gputest-cuda.o%j ### Request GPU Queue #BSUB -q gpu ### Request the time you need for execution in [hour:]minute #BSUB -W 15 ### Request virtual memory (in MB) #BSUB -M 512 module load cuda/40 CUDA code needs the whole virtual address space of the node Currently, we disabled the memory limit for the gpu queue cd $HOME/NVIDIA_GPU_Computing_SDK_4.0.17/C/bin/linux/release devicequery -noprompt Slide 10

How to use: GPU + MPI Multi-GPU usage with MPI 1 process per node (ppn) If you want to use only one GPU per node. If your process uses both GPUs in one node, e.g. via cudasetdevice. 2 processes per node If each process communicates to one GPU of the node. More processes per node If you have processes which do only computation on the CPU Note: exclusive process" mode still restricts one process per GPU Slide 11

How to use: GPU + MPI Interactive Specify GPU-hosts (otherwise job will run in compute cluster) nodes $MPIEXEC -n 3 -H linuxgpud1:1,linuxgpud2:2 <prog> Batch $MPIEXEC -n 2 m 1 -H linuxgpud1,linuxgpud2,linuxgpud3 ppn <prog> Set number of processes per node -n <#procs> -a {open intel}mpi -R span[ptile=<ppn>] To use the batch test node on daytime with MPI: -a gpu {open intel}mpi Note: In batch mode, all (working) GPUs are available also head node with only one GPU To get only machines with TWO attached GPUs: -m bull-gpu-om Slide 12

Batch script for multi GPU usage (with MPI) #!/usr/bin/env zsh known so far ### Job name #BSUB -J GPUTestMPI-Cuda ### File / path where STDOUT & STDERR will be written to #BSUB -o gputestmpi-cuda.o%j ### Request GPU Queue #BSUB -q gpu ### Request the time you need for execution in [hour:]minute #BSUB -W 15 ### Request virtual memory (in MB) #BSUB -M 512 [..] Slide 13

Batch script for multi GPU usage (with MPI) [..] ### Request the number of compute slots #BSUB -n 4 ### Set one process per node (ptile=ppn) #BSUB -R "span[ptile=1]" ### Use Open MPI #BSUB -a openmpi module load cuda/40 cd $HOME/simpleMPI $MPIEXEC $FLAGS_MPI_BATCH simplempi Slide 14

Additional notes: Windows + GPUs Access restriction Windows GPU group E-mail to servicedesk@rz.rwth-aachen.de GPU machines cluster-win-gpu ½ NVIDIA Tesla S1070 (2 GT200 GPUs) Host: 8-core Intel X5570 (Nehalem) @ 2.93 GHz In future NVIDIA Tesla C2050 (1 Fermi GPU) Host: 4-core Intel E5620 (Westmere) @ 2.40GHz Interactive + batch mode Software: CUDA Toolkit, Matlab, Parallel Nsight Debugger, Slide 15

Batch mode Windows Login with your TIM-account (WIN-HPC\xx) to the cluster frontend cluster-win Start HPC Job Manager Create a new job Slide 16

Batch mode Windows Select Job Details Job template: GPU Sets GPU resources Allows group gpu to access these resources Enter a job name Add your tasks More details: HPC on Windows Batch usage Compute Cluster Scheduler Submit you job Starts running as soon as: batch mode starts + job is scheduled Command line: job /help Slide 17