Managing and Deploying GPU Accelerators. ADAC17 - Resource Management Stephane Thiell and Kilian Cavalotti Stanford Research Computing Center
|
|
- Marianna Carson
- 5 years ago
- Views:
Transcription
1 Managing and Deploying GPU Accelerators ADAC17 - Resource Management Stephane Thiell and Kilian Cavalotti Stanford Research Computing Center
2 OUTLINE GPU resources at the SRCC Slurm and GPUs Slurm and GPU P2P Running Amber with GPU P2P (intranode) Running TensorFlow with Singularity
3 GPU RESOURCES AT THE SRCC
4 STANFORD SHERLOCK Supermicro GPU SuperServer 4028GR-TRT with 4U with 8 x Nvidia GeForce consumer-grade cards Shared compute cluster Dell C4130 1U with 4 x Nvidia Tesla cards Open to the Stanford community as a resource to support sponsored research Condo cluster, nodes are ordered quarterly (currently nodes) Heterogenous cluster with 73 GPU nodes / 500 GPU cards / Tesla and GeForce Total of ~2500 users (~420 are faculty) and 64 owners
5 STANFORD STREAM June 2015 (ISC15) 87th 6th Nov 2015 (SC15) 102nd 5th June 2016 (ISC16) 122nd 6th Nov 2016 (SC16) 162nd 8th June 2017 (ISC17) 214th Multi-GPU HPC cluster 520 x Nvidia K80, 1040 GPUs, 16 GPUs / node 24 x Nvidia P100E, 8 GPUs / node Energy efficient 1 PFlops (LINPACK peak) for only 190kW! 24th
6 STREAM SYSTEM CHARACTERISTICS (NODE LAYOUT) Stream s Cray CS-Storm 26268N node specs: 20 CPU cores (12 GB RAM/core) 256 GB of DDR3 RAM 16 Nvidia K80 GPUs, each with 12 GB of GDDR5 with ECC support Balanced PCIe bandwidth across GPUs (dual Root Complex) 1 Infiniband FDR
7 SLURM AND GPUs
8 SLURM WITH GENERIC RESOURCES (GRES) SLURM is the resource manager on both Sherlock and Stream Open source and full-featured (like GPU support) Generic Resource (GRES) scheduling $ srun [...] --gres gpu:2 [...] or #SBATCH --gres gpu:2 This defines the number of GPUs per node. GPU Compute Mode selection #SBATCH -C gpu_shared Custom option. Set the GPU Compute Mode to DEFAULT (shared) instead of ECLUSIVE PROCESS in Slurm Prolog. With -C gpu_shared, multiple processes are able to access a GPU. In general NOT recommended but sometimes required for multi-gpu jobs, for instance when running Amber or LAMMPS.
9 SHERLOCK SLURM GPU QOS SETTINGS Simple enforcement of GPU usage on Sherlock s GPU QoS MinTRES set to cpu=1,gres/gpu=1 Example of error if the above rule is not respected $ srun -p gpu --pty bash srun: error: Unable to allocate resources: Job violates accounting/qos policy (job submit limit, user's size and/or time limits) $ srun -p gpu --gres gpu:1 --pty bash srun: job queued and waiting for resources
10 SHERLOCK SLURM GPU FEATURES Sherlock has many different types of Nvidia GPUs We use Slurm FEATURES (-C) for GPU type selection GRES gpu:n is used for GPU allocation We used to specify the GPU type in GRES as in gpu:tesla:2 but using features is more flexible! # sinfo -o "%.10R %.8D %.10m %.5c %7z %8G %100f %N" PARTITION NODES MEMORY test test ownerxx ownerxy ownerxy ownerxz owners owners CPUS S:C:T GRES AVAIL_FEATURES 20 2:10:1 gpu:4 CPU_GEN:BDW,CPU_SKU:E5-2640v4,CPU_FRQ:2.40GHz,GPU_GEN:PSC,GPU_SKU:TESLA_P100_PCIE,GPU_MEM:16GB :10:1 gpu:4 CPU_GEN:BDW,CPU_SKU:E5-2640v4,CPU_FRQ:2.40GHz,GPU_GEN:PSC,GPU_SKU:TESLA_P40,GPU_MEM:24GB sh :10:1 gpu:4 CPU_GEN:BDW,CPU_SKU:E5-2640v4,CPU_FRQ:2.40GHz,GPU_GEN:PSC,GPU_SKU:TITAN_P,GPU_MEM:12GB sh-113-[06-07] :10:1 gpu:4 CPU_GEN:BDW,CPU_SKU:E5-2640v4,CPU_FRQ:2.40GHz,GPU_GEN:PSC,GPU_SKU:TESLA_P100_PCIE,GPU_MEM:16GB sh-112-[06-07] :10:1 gpu:4 CPU_GEN:BDW,CPU_SKU:E5-2640v4,CPU_FRQ:2.40GHz,GPU_GEN:PSC,GPU_SKU:TITAN_P,GPU_MEM:12GB sh-112-[08-11] :10:1 gpu:4 CPU_GEN:BDW,CPU_SKU:E5-2640v4,CPU_FRQ:2.40GHz,GPU_GEN:PSC,GPU_SKU:TITAN_P,GPU_MEM:12GB sh :10:1 gpu:4 CPU_GEN:BDW,CPU_SKU:E5-2640v4,CPU_FRQ:2.40GHz,GPU_GEN:PSC,GPU_SKU:TESLA_P100_PCIE,GPU_MEM:16GB sh-112-[06-07] :10:1 gpu:4 CPU_GEN:BDW,CPU_SKU:E5-2640v4,CPU_FRQ:2.40GHz,GPU_GEN:PSC,GPU_SKU:TITAN_P,GPU_MEM:12GB sh-112-[08-12]... Example of GPU type constraint: -C GPU_SKU:TITAN_P sh
11 STREAM SLURM CPU/GPU RATIOS Job submission rules are enforced to maximize GPU efficiency Max CPU/GPU ratio Default memory per CPU Max memory per CPU Max (system) memory per GPU Min GPU count 5/4 (20/16) 12,000 MB 12,800 MB 16,000 MB(*) 1 (*) Unlike memory/cpu, the number of GPUs is NOT automatically updated when you request more memory. CPU/GPU ratio enforcement implemented using a job_submit.lua plugin Example of error if the above CPU/GPU ratio rule is not respected $ srun -c 5 --gres gpu:1 command srun: error: CPUs requested per node (5) not allowed with only 1 GPU(s); increase the number of GPUs to 4 or reduce the number of CPUs srun: error: Unable to allocate resources: More processors requested than permitted
12 STREAM MULTI-GPU RESOURCE ALLOCATION GPU devices cgroups Slurm on Stream uses the Linux cgroup devices subsystem so that a job is only allowed to access its allocated GPU devices. $ srun --gres gpu:3 nvidia-smi -L GPU 0: Tesla K80 (UUID: GPU-fdae33a e8-1aef-4fa8a745ad07) GPU 1: Tesla K80 (UUID: GPU-f4735a45-ea34-55f1-35ba-50a84c4b462c) GPU 2: Tesla K80 (UUID: GPU-b1d8438e-7f05-9bc3-8d3e f) Consequence: the GPU IDs above and $CUDA_VISIBLE_DEVICES IDs within a job always start at 0. Display full node GPU Direct communication matrix and CPU affinity $ srun -c 20 --gres gpu:16 nvidia-smi topo -m
13 SLURM AND GPU PEER TO PEER COMMUNICATION
14 GPU PEER-TO-PEER WITH SLURM ON STREAM (1/5) #SBATCH --gres-flags=enforce-binding Standard Slurm option. Enforce GRES/CPU binding, ie. all CPUs and GRES (here, GPUs) will be allocated within the same CPU socket(s). Required for GPU P2P. Sufficient when used with 1 CPU (-c1) but USELESS when CPUs are allocated across different CPU sockets. Need to bind tasks (and their cpus) to a specific CPU socket. #SBATCH --cpus-per-task=1 to 10, for example: -c 8 #SBATCH --ntasks-per-socket=n, with n == ntasks Standard Slurm option. Masks will automatically be generated to bind the tasks to specific sockets.
15 GPU PEER-TO-PEER WITH SLURM ON STREAM (2/5) Example of bad resource allocation on Stream for GPU P2P
16 GPU PEER-TO-PEER WITH SLURM ON STREAM (3/5) Bad case of GPU P2P allocation: 1 CPU (core 0) and 6 GPUs are already allocated on the first CPU socket by requesting 8 CPUs and 8 GPUs, we may get: 8 CPUs not on the same CPU socket 8 GPUs not on the same PCIe Root Complex (CPU socket): $ srun -c8 --gres gpu:8 nvidia-smi topo -m GPU0 GPU1 GPU2 GPU3 GPU4 GPU5 GPU6 GPU7 mlx4_0 CPU Affinity GPU0 PI 1-8 GPU1 PI 1-8 GPU2 PI PB PB 1-8 GPU3 PI PB PB 1-8 GPU4 PB PB PI 1-8 GPU5 PB PB PI 1-8 GPU6 PI 1-8 GPU7 PI 1-8 mlx4_0
17 GPU PEER-TO-PEER WITH SLURM ON STREAM (4/5) Fix issue by using the correct Slurm options for GPU P2P Group CPUs on the same socket with --ntasks-per-socket=n Group GPUs on the same socket with --gres-flags=enforce-binding $ srun -c8 --gres gpu:8 --gres-flags=enforce-binding --ntasks-per-socket=1 \ nvidia-smi topo -m GPU0 GPU1 GPU2 GPU3 GPU4 GPU5 GPU6 GPU7 mlx4_0 CPU Affinity GPU0 PI PB PB GPU1 PI PB PB GPU2 PB PB PI GPU3 PB PB PI GPU4 PI PB PB GPU5 PI PB PB GPU6 PB PB PI GPU7 PB PB PI mlx4_0
18 GPU PEER-TO-PEER WITH SLURM ON STREAM (5/5) Example of proper resource allocation on Stream for GPU P2P
19 RUNNING AMBER WITH GPU P2P (INTRANODE)
20 STREAM RUNNING AMBER WITH GPU P2P (1/3) AMBER ( is a popular molecular dynamics simulation software used on both Sherlock and Stream. AMBER is interesting to study as it has the ability to use GPUs to massively accelerate PMEMD for both explicit solvent PME (Particle Mesh Ewald) and implicit solvent GB (Generalized Born). Key new features include (...) peer to peer support for multi-gpu runs providing enhanced multi-gpu scaling.
21 STREAM RUNNING AMBER WITH GPU P2P (2/3) AMBER Benchmark: DHFR NPT HMR 4fs = 23,558 atoms amber_bench_pme_p2p_4gpus.sbatch #!/bin/bash #SBATCH -o slurm_amber_pme_p2p_4gpus.%j.out #SBATCH --ntasks=4 #SBATCH --ntasks-per-socket=4 #SBATCH --gres gpu:4 #SBATCH --gres-flags=enforce-binding #SBATCH -C gpu_shared #SBATCH -t 1:00:00 echo echo echo echo "" "JAC_PRODUCTION_NPT - 23,558 atoms PME 4fs" " " "" module load intel/ Amber/14 cd PME/JAC_production_NPT_4fs srun $AMBERHOME/bin/pmemd.cuda.MPI -O -i mdin.gpu -o mdout.4gpu -inf mdinfo.4gpu -x mdcrd.4gpu -r restrt.4gpu
22 STREAM RUNNING AMBER WITH GPU P2P (3/3) Overview of CPU / GPU / multigpu / multigpu P2P performance Other results are available at (without ECC)
23 GPU RESOURCE MANAGEMENT WITH CONTAINERS: RUNNING TENSORFLOW WITH SINGULARITY
24 TENSORFLOW MODEL TRAINING (1/4) Software like TensorFlow is evolving so quickly that it requires too much pain to install/upgrade on an HPC cluster on a regular basis, especially on an old OS (Stream is still running RHEL 6.9). Singularity is a container technology developed for HPC. It is also installed on the major SEDE computing systems. Thanks to the new Nvidia GPU support in Singularity 2.3 (through the --nv option), we can now use Singularity with GPUs on both Sherlock and Stream!
25 TENSORFLOW MODEL TRAINING (2/4) Step 1: get the latest Tensorflow image Create a new Singularity image and import docker image. $ module load singularity $ singularity pull docker://tensorflow/tensorflow:latest-gpu Step 2: TensorFlow container quick test $ singularity shell --home $WORK:/home --nv tensorflow-latest-gpu.img Singularity tensorflow.img:~> python >>> import tensorflow as tf
26 TENSORFLOW MODEL TRAINING (3/4) Step 3: run CIFAR10 training on single GPU #!/bin/bash #SBATCH --job-name=cifar10_1gpu #SBATCH --output=slurm_cifar10_1gpu_%j.out #SBATCH --cpus-per-task=1 #SBATCH --gres gpu:1 #SBATCH --time=1:00:00 TENSORFLOW_IMG=tensorflow-latest-gpu.img CIFAR10_DIR=PEARC17_ECSS/tensorflow/cifar10 mkdir $LSTOR/cifar10_data cp -v cifar-10-binary.tar.gz $LSTOR/cifar10_data/ module load singularity srun singularity exec --home $WORK:/home --bind $LSTOR:/tmp --nv $TENSORFLOW_IMG \ python $CIFAR10_DIR/cifar10_train.py --batch_size=128 \ --log_device_placement=false \ --max_steps= extract of cifar10_1gpu.sbatch for Stream
27 TENSORFLOW MODEL TRAINING (4/4) Step 4: run CIFAR10 training on multiple GPUs #!/bin/bash #SBATCH --job-name=cifar10_2gpu #SBATCH --output=slurm_cifar10_2gpu_%j.out #SBATCH --cpus-per-task=2 #SBATCH --ntasks-per-socket=1 #SBATCH --gres gpu:2 #SBATCH --gres-flags=enforce-binding #SBATCH --time=1:00:00 TENSORFLOW_IMG=tensorflow-latest-gpu.img CIFAR10_DIR=PEARC17_ECSS/tensorflow/cifar10 mkdir $LSTOR/cifar10_data cp -v cifar-10-binary.tar.gz $LSTOR/cifar10_data/ module load singularity srun singularity exec --home $WORK:/home --bind $LSTOR:/tmp --nv $TENSORFLOW_IMG \ python $CIFAR10_DIR/cifar10_multi_gpu_train.py --num_gpus=2 \ --batch_size=64 \ --log_device_placement=false \ --max_steps= extract of cifar10_2gpu.sbatch for Stream
28 CONTACT
29
Sherlock for IBIIS. William Law Stanford Research Computing
Sherlock for IBIIS William Law Stanford Research Computing Overview How we can help System overview Tech specs Signing on Batch submission Software environment Interactive jobs Next steps We are here to
More informationSCALABLE HYBRID PROTOTYPE
SCALABLE HYBRID PROTOTYPE Scalable Hybrid Prototype Part of the PRACE Technology Evaluation Objectives Enabling key applications on new architectures Familiarizing users and providing a research platform
More informationIntroduction to SLURM & SLURM batch scripts
Introduction to SLURM & SLURM batch scripts Anita Orendt Assistant Director Research Consulting & Faculty Engagement anita.orendt@utah.edu 16 Feb 2017 Overview of Talk Basic SLURM commands SLURM batch
More informationIntroduction to SLURM & SLURM batch scripts
Introduction to SLURM & SLURM batch scripts Anita Orendt Assistant Director Research Consulting & Faculty Engagement anita.orendt@utah.edu 6 February 2018 Overview of Talk Basic SLURM commands SLURM batch
More informationDescription of Power8 Nodes Available on Mio (ppc[ ])
Description of Power8 Nodes Available on Mio (ppc[001-002]) Introduction: HPC@Mines has released two brand-new IBM Power8 nodes (identified as ppc001 and ppc002) to production, as part of our Mio cluster.
More informationSubmitting and running jobs on PlaFRIM2 Redouane Bouchouirbat
Submitting and running jobs on PlaFRIM2 Redouane Bouchouirbat Summary 1. Submitting Jobs: Batch mode - Interactive mode 2. Partition 3. Jobs: Serial, Parallel 4. Using generic resources Gres : GPUs, MICs.
More informationBest Practices for Deploying and Managing GPU Clusters
Best Practices for Deploying and Managing GPU Clusters Dale Southard, NVIDIA dsouthard@nvidia.com About the Speaker and You [Dale] is a senior solution architect with NVIDIA (I fix things). I primarily
More informationIntroduction to Joker Cyber Infrastructure Architecture Team CIA.NMSU.EDU
Introduction to Joker Cyber Infrastructure Architecture Team CIA.NMSU.EDU What is Joker? NMSU s supercomputer. 238 core computer cluster. Intel E-5 Xeon CPUs and Nvidia K-40 GPUs. InfiniBand innerconnect.
More informationThe rcuda middleware and applications
The rcuda middleware and applications Will my application work with rcuda? rcuda currently provides binary compatibility with CUDA 5.0, virtualizing the entire Runtime API except for the graphics functions,
More informationGROMACS (GPU) Performance Benchmark and Profiling. February 2016
GROMACS (GPU) Performance Benchmark and Profiling February 2016 2 Note The following research was performed under the HPC Advisory Council activities Participating vendors: Dell, Mellanox, NVIDIA Compute
More informationShifter: Fast and consistent HPC workflows using containers
Shifter: Fast and consistent HPC workflows using containers CUG 2017, Redmond, Washington Lucas Benedicic, Felipe A. Cruz, Thomas C. Schulthess - CSCS May 11, 2017 Outline 1. Overview 2. Docker 3. Shifter
More informationHigh Performance Computing with Accelerators
High Performance Computing with Accelerators Volodymyr Kindratenko Innovative Systems Laboratory @ NCSA Institute for Advanced Computing Applications and Technologies (IACAT) National Center for Supercomputing
More informationBright Cluster Manager: Using the NVIDIA NGC Deep Learning Containers
Bright Cluster Manager: Using the NVIDIA NGC Deep Learning Containers Technical White Paper Table of Contents Pre-requisites...1 Setup...2 Run PyTorch in Kubernetes...3 Run PyTorch in Singularity...4 Run
More informationPART-I (B) (TECHNICAL SPECIFICATIONS & COMPLIANCE SHEET) Supply and installation of High Performance Computing System
INSTITUTE FOR PLASMA RESEARCH (An Autonomous Institute of Department of Atomic Energy, Government of India) Near Indira Bridge; Bhat; Gandhinagar-382428; India PART-I (B) (TECHNICAL SPECIFICATIONS & COMPLIANCE
More informationIntroduction to Abel/Colossus and the queuing system
Introduction to Abel/Colossus and the queuing system November 14, 2018 Sabry Razick Research Infrastructure Services Group, USIT Topics First 7 slides are about us and links The Research Computing Services
More informationChoosing Resources Wisely Plamen Krastev Office: 38 Oxford, Room 117 FAS Research Computing
Choosing Resources Wisely Plamen Krastev Office: 38 Oxford, Room 117 Email:plamenkrastev@fas.harvard.edu Objectives Inform you of available computational resources Help you choose appropriate computational
More informationExercises: Abel/Colossus and SLURM
Exercises: Abel/Colossus and SLURM November 08, 2016 Sabry Razick The Research Computing Services Group, USIT Topics Get access Running a simple job Job script Running a simple job -- qlogin Customize
More informationScheduling By Trackable Resources
Scheduling By Trackable Resources Morris Jette and Dominik Bartkiewicz SchedMD Slurm User Group Meeting 2018 Thanks to NVIDIA for sponsoring this work Goals More flexible scheduling mechanism Especially
More informationS THE MAKING OF DGX SATURNV: BREAKING THE BARRIERS TO AI SCALE. Presenter: Louis Capps, Solution Architect, NVIDIA,
S7750 - THE MAKING OF DGX SATURNV: BREAKING THE BARRIERS TO AI SCALE Presenter: Louis Capps, Solution Architect, NVIDIA, lcapps@nvidia.com A TALE OF ENLIGHTENMENT Basic OK List 10 for x = 1 to 3 20 print
More informationIntroduction to SLURM & SLURM batch scripts
Introduction to SLURM & SLURM batch scripts Anita Orendt Assistant Director Research Consulting & Faculty Engagement anita.orendt@utah.edu 23 June 2016 Overview of Talk Basic SLURM commands SLURM batch
More informationSystem Design of Kepler Based HPC Solutions. Saeed Iqbal, Shawn Gao and Kevin Tubbs HPC Global Solutions Engineering.
System Design of Kepler Based HPC Solutions Saeed Iqbal, Shawn Gao and Kevin Tubbs HPC Global Solutions Engineering. Introduction The System Level View K20 GPU is a powerful parallel processor! K20 has
More informationNAMD GPU Performance Benchmark. March 2011
NAMD GPU Performance Benchmark March 2011 Note The following research was performed under the HPC Advisory Council activities Participating vendors: Dell, Intel, Mellanox Compute resource - HPC Advisory
More informationHabanero Operating Committee. January
Habanero Operating Committee January 25 2017 Habanero Overview 1. Execute Nodes 2. Head Nodes 3. Storage 4. Network Execute Nodes Type Quantity Standard 176 High Memory 32 GPU* 14 Total 222 Execute Nodes
More informationLAMMPS-KOKKOS Performance Benchmark and Profiling. September 2015
LAMMPS-KOKKOS Performance Benchmark and Profiling September 2015 2 Note The following research was performed under the HPC Advisory Council activities Participating vendors: Intel, Dell, Mellanox, NVIDIA
More informationTECHNICAL OVERVIEW ACCELERATED COMPUTING AND THE DEMOCRATIZATION OF SUPERCOMPUTING
TECHNICAL OVERVIEW ACCELERATED COMPUTING AND THE DEMOCRATIZATION OF SUPERCOMPUTING Table of Contents: The Accelerated Data Center Optimizing Data Center Productivity Same Throughput with Fewer Server Nodes
More informationShifter and Singularity on Blue Waters
Shifter and Singularity on Blue Waters Maxim Belkin June 7, 2018 A simplistic view of a scientific application DATA RESULTS My Application Received an allocation on Blue Waters! DATA RESULTS My Application
More informationIncreasing the efficiency of your GPU-enabled cluster with rcuda. Federico Silla Technical University of Valencia Spain
Increasing the efficiency of your -enabled cluster with rcuda Federico Silla Technical University of Valencia Spain Outline Why remote virtualization? How does rcuda work? The performance of the rcuda
More informationTESLA V100 PERFORMANCE GUIDE. Life Sciences Applications
TESLA V100 PERFORMANCE GUIDE Life Sciences Applications NOVEMBER 2017 TESLA V100 PERFORMANCE GUIDE Modern high performance computing (HPC) data centers are key to solving some of the world s most important
More informationCUDA Accelerated Linpack on Clusters. E. Phillips, NVIDIA Corporation
CUDA Accelerated Linpack on Clusters E. Phillips, NVIDIA Corporation Outline Linpack benchmark CUDA Acceleration Strategy Fermi DGEMM Optimization / Performance Linpack Results Conclusions LINPACK Benchmark
More informationIntroduction to SLURM on the High Performance Cluster at the Center for Computational Research
Introduction to SLURM on the High Performance Cluster at the Center for Computational Research Cynthia Cornelius Center for Computational Research University at Buffalo, SUNY 701 Ellicott St Buffalo, NY
More informationSlurm at the George Washington University Tim Wickberg - Slurm User Group Meeting 2015
Slurm at the George Washington University Tim Wickberg - wickberg@gwu.edu Slurm User Group Meeting 2015 September 16, 2015 Colonial One What s new? Only major change was switch to FairTree Thanks to BYU
More informationDeploying remote GPU virtualization with rcuda. Federico Silla Technical University of Valencia Spain
Deploying remote virtualization with rcuda Federico Silla Technical University of Valencia Spain st Outline What is remote virtualization? HPC ADMINTECH 2016 2/53 It deals with s, obviously! HPC ADMINTECH
More informationUSING NGC WITH GOOGLE CLOUD PLATFORM
USING NGC WITH GOOGLE CLOUD PLATFORM DU-08962-001 _v02 April 2018 Setup Guide TABLE OF CONTENTS Chapter 1. Introduction to... 1 Chapter 2. Deploying an NVIDIA GPU Cloud Image from the GCP Console...3 2.1.
More informationBRC HPC Services/Savio
BRC HPC Services/Savio Krishna Muriki and Gregory Kurtzer LBNL/BRC kmuriki@berkeley.edu, gmk@lbl.gov SAVIO - The Need Has Been Stated Inception and design was based on a specific need articulated by Eliot
More informationData Partitioning on Heterogeneous Multicore and Multi-GPU systems Using Functional Performance Models of Data-Parallel Applictions
Data Partitioning on Heterogeneous Multicore and Multi-GPU systems Using Functional Performance Models of Data-Parallel Applictions Ziming Zhong Vladimir Rychkov Alexey Lastovetsky Heterogeneous Computing
More informationDuke Compute Cluster Workshop. 3/28/2018 Tom Milledge rc.duke.edu
Duke Compute Cluster Workshop 3/28/2018 Tom Milledge rc.duke.edu rescomputing@duke.edu Outline of talk Overview of Research Computing resources Duke Compute Cluster overview Running interactive and batch
More informationGPU- Aware Design, Implementation, and Evaluation of Non- blocking Collective Benchmarks
GPU- Aware Design, Implementation, and Evaluation of Non- blocking Collective Benchmarks Presented By : Esthela Gallardo Ammar Ahmad Awan, Khaled Hamidouche, Akshay Venkatesh, Jonathan Perkins, Hari Subramoni,
More informationHigh Performance Computing Cluster Basic course
High Performance Computing Cluster Basic course Jeremie Vandenplas, Gwen Dawes 30 October 2017 Outline Introduction to the Agrogenomics HPC Connecting with Secure Shell to the HPC Introduction to the Unix/Linux
More informationWorking with Shell Scripting. Daniel Balagué
Working with Shell Scripting Daniel Balagué Editing Text Files We offer many text editors in the HPC cluster. Command-Line Interface (CLI) editors: vi / vim nano (very intuitive and easy to use if you
More informationIntroduction to High-Performance Computing (HPC)
Introduction to High-Performance Computing (HPC) Computer components CPU : Central Processing Unit cores : individual processing units within a CPU Storage : Disk drives HDD : Hard Disk Drive SSD : Solid
More informationAMBER 11 Performance Benchmark and Profiling. July 2011
AMBER 11 Performance Benchmark and Profiling July 2011 Note The following research was performed under the HPC Advisory Council activities Participating vendors: AMD, Dell, Mellanox Compute resource -
More informationBatch Usage on JURECA Introduction to Slurm. May 2016 Chrysovalantis Paschoulas HPS JSC
Batch Usage on JURECA Introduction to Slurm May 2016 Chrysovalantis Paschoulas HPS group @ JSC Batch System Concepts Resource Manager is the software responsible for managing the resources of a cluster,
More informationSuperMike-II Launch Workshop. System Overview and Allocations
: System Overview and Allocations Dr Jim Lupo CCT Computational Enablement jalupo@cct.lsu.edu SuperMike-II: Serious Heterogeneous Computing Power System Hardware SuperMike provides 442 nodes, 221TB of
More informationDuke Compute Cluster Workshop. 10/04/2018 Tom Milledge rc.duke.edu
Duke Compute Cluster Workshop 10/04/2018 Tom Milledge rc.duke.edu rescomputing@duke.edu Outline of talk Overview of Research Computing resources Duke Compute Cluster overview Running interactive and batch
More informationDuke Compute Cluster Workshop. 11/10/2016 Tom Milledge h:ps://rc.duke.edu/
Duke Compute Cluster Workshop 11/10/2016 Tom Milledge h:ps://rc.duke.edu/ rescompu>ng@duke.edu Outline of talk Overview of Research Compu>ng resources Duke Compute Cluster overview Running interac>ve and
More informationBatch Systems. Running your jobs on an HPC machine
Batch Systems Running your jobs on an HPC machine Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us
More informationHow to run a job on a Cluster?
How to run a job on a Cluster? Cluster Training Workshop Dr Samuel Kortas Computational Scientist KAUST Supercomputing Laboratory Samuel.kortas@kaust.edu.sa 17 October 2017 Outline 1. Resources available
More informationn N c CIni.o ewsrg.au
@NCInews NCI and Raijin National Computational Infrastructure 2 Our Partners General purpose, highly parallel processors High FLOPs/watt and FLOPs/$ Unit of execution Kernel Separate memory subsystem GPGPU
More informationLAMMPSCUDA GPU Performance. April 2011
LAMMPSCUDA GPU Performance April 2011 Note The following research was performed under the HPC Advisory Council activities Participating vendors: Dell, Intel, Mellanox Compute resource - HPC Advisory Council
More informationBatch Systems & Parallel Application Launchers Running your jobs on an HPC machine
Batch Systems & Parallel Application Launchers Running your jobs on an HPC machine Partners Funding Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike
More informationNAMD Performance Benchmark and Profiling. February 2012
NAMD Performance Benchmark and Profiling February 2012 Note The following research was performed under the HPC Advisory Council activities Participating vendors: AMD, Dell, Mellanox Compute resource -
More informationSlurm basics. Summer Kickstart June slide 1 of 49
Slurm basics Summer Kickstart 2017 June 2017 slide 1 of 49 Triton layers Triton is a powerful but complex machine. You have to consider: Connecting (ssh) Data storage (filesystems and Lustre) Resource
More informationrcuda: towards energy-efficiency in GPU computing by leveraging low-power processors and InfiniBand interconnects
rcuda: towards energy-efficiency in computing by leveraging low-power processors and InfiniBand interconnects Federico Silla Technical University of Valencia Spain Joint research effort Outline Current
More informationIntroduction to High-Performance Computing (HPC)
Introduction to High-Performance Computing (HPC) Computer components CPU : Central Processing Unit cores : individual processing units within a CPU Storage : Disk drives HDD : Hard Disk Drive SSD : Solid
More informationCALMIP : HIGH PERFORMANCE COMPUTING
CALMIP : HIGH PERFORMANCE COMPUTING Nicolas.renon@univ-tlse3.fr Emmanuel.courcelle@inp-toulouse.fr CALMIP (UMS 3667) Espace Clément Ader www.calmip.univ-toulouse.fr CALMIP :Toulouse University Computing
More informationIntroduction to RCC. September 14, 2016 Research Computing Center
Introduction to HPC @ RCC September 14, 2016 Research Computing Center What is HPC High Performance Computing most generally refers to the practice of aggregating computing power in a way that delivers
More informationParallel Applications on Distributed Memory Systems. Le Yan HPC User LSU
Parallel Applications on Distributed Memory Systems Le Yan HPC User Services @ LSU Outline Distributed memory systems Message Passing Interface (MPI) Parallel applications 6/3/2015 LONI Parallel Programming
More informationResources Current and Future Systems. Timothy H. Kaiser, Ph.D.
Resources Current and Future Systems Timothy H. Kaiser, Ph.D. tkaiser@mines.edu 1 Most likely talk to be out of date History of Top 500 Issues with building bigger machines Current and near future academic
More information1 Bull, 2011 Bull Extreme Computing
1 Bull, 2011 Bull Extreme Computing Table of Contents Overview. Principal concepts. Architecture. Scheduler Policies. 2 Bull, 2011 Bull Extreme Computing SLURM Overview Ares, Gerardo, HPC Team Introduction
More informationIntroduction to RCC. January 18, 2017 Research Computing Center
Introduction to HPC @ RCC January 18, 2017 Research Computing Center What is HPC High Performance Computing most generally refers to the practice of aggregating computing power in a way that delivers much
More informationUniversity at Buffalo Center for Computational Research
University at Buffalo Center for Computational Research The following is a short and long description of CCR Facilities for use in proposals, reports, and presentations. If desired, a letter of support
More informationIs remote GPU virtualization useful? Federico Silla Technical University of Valencia Spain
Is remote virtualization useful? Federico Silla Technical University of Valencia Spain st Outline What is remote virtualization? HPC Advisory Council Spain Conference 2015 2/57 We deal with s, obviously!
More informationWaveView. System Requirement V6. Reference: WST Page 1. WaveView System Requirements V6 WST
WaveView System Requirement V6 Reference: WST-0125-01 www.wavestore.com Page 1 WaveView System Requirements V6 Copyright notice While every care has been taken to ensure the information contained within
More informationThe rcuda technology: an inexpensive way to improve the performance of GPU-based clusters Federico Silla
The rcuda technology: an inexpensive way to improve the performance of -based clusters Federico Silla Technical University of Valencia Spain The scope of this talk Delft, April 2015 2/47 More flexible
More informationUsing a Linux System 6
Canaan User Guide Connecting to the Cluster 1 SSH (Secure Shell) 1 Starting an ssh session from a Mac or Linux system 1 Starting an ssh session from a Windows PC 1 Once you're connected... 1 Ending an
More informationWorkstations & Thin Clients
1 Workstations & Thin Clients Overview Why use a BioHPC computer? System Specs Network requirements OS Tour Running Code Locally Submitting Jobs to the Cluster Run Graphical Jobs on the Cluster Use Windows
More informationGraham vs legacy systems
New User Seminar Graham vs legacy systems This webinar only covers topics pertaining to graham. For the introduction to our legacy systems (Orca etc.), please check the following recorded webinar: SHARCNet
More informationJune Workshop Series June 27th: All About SLURM University of Nebraska Lincoln Holland Computing Center. Carrie Brown, Adam Caprez
June Workshop Series June 27th: All About SLURM University of Nebraska Lincoln Holland Computing Center Carrie Brown, Adam Caprez Setup Instructions Please complete these steps before the lessons start
More informationSlurm Version Overview
Slurm Version 18.08 Overview Brian Christiansen SchedMD Slurm User Group Meeting 2018 Schedule Previous major release was 17.11 (November 2017) Latest major release 18.08 (August 2018) Next major release
More informationVersions and 14.11
Slurm Update Versions 14.03 and 14.11 Jacob Jenson jacob@schedmd.com Yiannis Georgiou yiannis.georgiou@bull.net V14.03 - Highlights Support for native Slurm operation on Cray systems (without ALPS) Run
More informationSlurm Overview. Brian Christiansen, Marshall Garey, Isaac Hartung SchedMD SC17. Copyright 2017 SchedMD LLC
Slurm Overview Brian Christiansen, Marshall Garey, Isaac Hartung SchedMD SC17 Outline Roles of a resource manager and job scheduler Slurm description and design goals Slurm architecture and plugins Slurm
More informationIntroduction to High Performance Computing at Case Western Reserve University. KSL Data Center
Introduction to High Performance Computing at Case Western Reserve University Research Computing and CyberInfrastructure team KSL Data Center Presenters Emily Dragowsky Daniel Balagué Guardia Hadrian Djohari
More informationBright Cluster Manager
Bright Cluster Manager Using Slurm for Data Aware Scheduling in the Cloud Martijn de Vries CTO About Bright Computing Bright Computing 1. Develops and supports Bright Cluster Manager for HPC systems, server
More informationArchitecting and Managing GPU Clusters. Dale Southard, NVIDIA
Architecting and Managing GPU Clusters Dale Southard, NVIDIA About the Speaker and You [Dale] is a senior solution architect with NVIDIA (I fix things). I primarily cover HPC in Gov/Edu/Research and on
More informationOPEN MPI WITH RDMA SUPPORT AND CUDA. Rolf vandevaart, NVIDIA
OPEN MPI WITH RDMA SUPPORT AND CUDA Rolf vandevaart, NVIDIA OVERVIEW What is CUDA-aware History of CUDA-aware support in Open MPI GPU Direct RDMA support Tuning parameters Application example Future work
More informationTESLA P100 PERFORMANCE GUIDE. HPC and Deep Learning Applications
TESLA P PERFORMANCE GUIDE HPC and Deep Learning Applications MAY 217 TESLA P PERFORMANCE GUIDE Modern high performance computing (HPC) data centers are key to solving some of the world s most important
More informationBatch Systems. Running calculations on HPC resources
Batch Systems Running calculations on HPC resources Outline What is a batch system? How do I interact with the batch system Job submission scripts Interactive jobs Common batch systems Converting between
More informationHeterogeneous Job Support
Heterogeneous Job Support Tim Wickberg SchedMD SC17 Submitting Jobs Multiple independent job specifications identified in command line using : separator The job specifications are sent to slurmctld daemon
More informationNVIDIA GPU Computing Séminaire Calcul Hybride Aristote 25 Mars 2010
NVIDIA GPU Computing 2010 Séminaire Calcul Hybride Aristote 25 Mars 2010 NVIDIA GPU Computing 2010 Tesla 3 rd generation Full OEM coverage Ecosystem focus Value Propositions per segments Card System Module
More informationAndrej Filipčič
Singularity@SiGNET Andrej Filipčič SiGNET 4.5k cores, 3PB storage, 4.8.17 kernel on WNs and Gentoo host OS 2 ARC-CEs with 700TB cephfs ARC cache and 3 data delivery nodes for input/output file staging
More informationENABLING NEW SCIENCE GPU SOLUTIONS
ENABLING NEW SCIENCE TESLA BIO Workbench The NVIDIA Tesla Bio Workbench enables biophysicists and computational chemists to push the boundaries of life sciences research. It turns a standard PC into a
More informationACCELERATED COMPUTING: THE PATH FORWARD. Jen-Hsun Huang, Co-Founder and CEO, NVIDIA SC15 Nov. 16, 2015
ACCELERATED COMPUTING: THE PATH FORWARD Jen-Hsun Huang, Co-Founder and CEO, NVIDIA SC15 Nov. 16, 2015 COMMODITY DISRUPTS CUSTOM SOURCE: Top500 ACCELERATED COMPUTING: THE PATH FORWARD It s time to start
More informationSlurm Workload Manager Overview SC15
Slurm Workload Manager Overview SC15 Alejandro Sanchez alex@schedmd.com Slurm Workload Manager Overview Originally intended as simple resource manager, but has evolved into sophisticated batch scheduler
More informationChoosing Resources Wisely. What is Research Computing?
Choosing Resources Wisely Scott Yockel, PhD Harvard - Research Computing What is Research Computing? Faculty of Arts and Sciences (FAS) department that handles nonenterprise IT requests from researchers.
More informationGPU Clusters for High- Performance Computing Jeremy Enos Innovative Systems Laboratory
GPU Clusters for High- Performance Computing Jeremy Enos Innovative Systems Laboratory National Center for Supercomputing Applications University of Illinois at Urbana-Champaign Presentation Outline NVIDIA
More informationrcuda: hybrid CPU-GPU clusters Federico Silla Technical University of Valencia Spain
rcuda: hybrid - clusters Federico Silla Technical University of Valencia Spain Outline 1. Hybrid - clusters 2. Concerns with hybrid clusters 3. One possible solution: virtualize s! 4. rcuda what s that?
More informationA PCIe Congestion-Aware Performance Model for Densely Populated Accelerator Servers
A PCIe Congestion-Aware Performance Model for Densely Populated Accelerator Servers Maxime Martinasso, Grzegorz Kwasniewski, Sadaf R. Alam, Thomas C. Schulthess, Torsten Hoefler Swiss National Supercomputing
More informationSlurm and Abel job scripts. Katerina Michalickova The Research Computing Services Group SUF/USIT November 13, 2013
Slurm and Abel job scripts Katerina Michalickova The Research Computing Services Group SUF/USIT November 13, 2013 Abel in numbers Nodes - 600+ Cores - 10000+ (1 node->2 processors->16 cores) Total memory
More informationRECENT TRENDS IN GPU ARCHITECTURES. Perspectives of GPU computing in Science, 26 th Sept 2016
RECENT TRENDS IN GPU ARCHITECTURES Perspectives of GPU computing in Science, 26 th Sept 2016 NVIDIA THE AI COMPUTING COMPANY GPU Computing Computer Graphics Artificial Intelligence 2 NVIDIA POWERS WORLD
More informationUsing Compute Canada. Masao Fujinaga Information Services and Technology University of Alberta
Using Compute Canada Masao Fujinaga Information Services and Technology University of Alberta Introduction to cedar batch system jobs are queued priority depends on allocation and past usage Cedar Nodes
More informationHETEROGENEOUS HPC, ARCHITECTURAL OPTIMIZATION, AND NVLINK STEVE OBERLIN CTO, TESLA ACCELERATED COMPUTING NVIDIA
HETEROGENEOUS HPC, ARCHITECTURAL OPTIMIZATION, AND NVLINK STEVE OBERLIN CTO, TESLA ACCELERATED COMPUTING NVIDIA STATE OF THE ART 2012 18,688 Tesla K20X GPUs 27 PetaFLOPS FLAGSHIP SCIENTIFIC APPLICATIONS
More informationRemote GPU virtualization: pros and cons of a recent technology. Federico Silla Technical University of Valencia Spain
Remote virtualization: pros and cons of a recent technology Federico Silla Technical University of Valencia Spain The scope of this talk HPC Advisory Council Brazil Conference 2015 2/43 st Outline What
More informationSNAP Performance Benchmark and Profiling. April 2014
SNAP Performance Benchmark and Profiling April 2014 Note The following research was performed under the HPC Advisory Council activities Participating vendors: HP, Mellanox For more information on the supporting
More informationAdvanced Topics in High Performance Scientific Computing [MA5327] Exercise 1
Advanced Topics in High Performance Scientific Computing [MA5327] Exercise 1 Manfred Liebmann Technische Universität München Chair of Optimal Control Center for Mathematical Sciences, M17 manfred.liebmann@tum.de
More informationAn Introduction to GPU Architecture and CUDA C/C++ Programming. Bin Chen April 4, 2018 Research Computing Center
An Introduction to GPU Architecture and CUDA C/C++ Programming Bin Chen April 4, 2018 Research Computing Center Outline Introduction to GPU architecture Introduction to CUDA programming model Using the
More informationSlurm Birds of a Feather
Slurm Birds of a Feather Tim Wickberg SchedMD SC17 Outline Welcome Roadmap Review of 17.02 release (Februrary 2017) Overview of upcoming 17.11 (November 2017) release Roadmap for 18.08 and beyond Time
More informationOur Workshop Environment
Our Workshop Environment John Urbanic Parallel Computing Scientist Pittsburgh Supercomputing Center Copyright 2018 Our Environment Today Your laptops or workstations: only used for portal access Bridges
More informationGPUs and Emerging Architectures
GPUs and Emerging Architectures Mike Giles mike.giles@maths.ox.ac.uk Mathematical Institute, Oxford University e-infrastructure South Consortium Oxford e-research Centre Emerging Architectures p. 1 CPUs
More informationHigh Performance Computing Cluster Advanced course
High Performance Computing Cluster Advanced course Jeremie Vandenplas, Gwen Dawes 9 November 2017 Outline Introduction to the Agrogenomics HPC Submitting and monitoring jobs on the HPC Parallel jobs on
More informationGuillimin HPC Users Meeting March 16, 2017
Guillimin HPC Users Meeting March 16, 2017 guillimin@calculquebec.ca McGill University / Calcul Québec / Compute Canada Montréal, QC Canada Please be kind to your fellow user meeting attendees Limit to
More information