Introduction to Cheyenne. 12 January, 2017 Consulting Services Group Brian Vanderwende

Similar documents
Introduction to NCAR HPC. 25 May 2017 Consulting Services Group Brian Vanderwende

Introduction to the NCAR HPC Systems. 25 May 2018 Consulting Services Group Brian Vanderwende

How to access Geyser and Caldera from Cheyenne. 19 December 2017 Consulting Services Group Brian Vanderwende

How to run applications on Aziz supercomputer. Mohammad Rafi System Administrator Fujitsu Technology Solutions

NCL on Yellowstone. Mary Haley October 22, 2014 With consulting support from B.J. Smith. Sponsored in part by the National Science Foundation

Before We Start. Sign in hpcxx account slips Windows Users: Download PuTTY. Google PuTTY First result Save putty.exe to Desktop

NCAR Globally Accessible Data Environment (GLADE) Updated: 15 Feb 2017

User Guide of High Performance Computing Cluster in School of Physics

XSEDE New User Tutorial

Introduction to GALILEO

Intel Manycore Testing Lab (MTL) - Linux Getting Started Guide

Introduction to PICO Parallel & Production Enviroment

XSEDE New User Tutorial

Our new HPC-Cluster An overview

OBTAINING AN ACCOUNT:

Introduction to GALILEO

XSEDE New User Tutorial

EIC system user manual

How to Use a Supercomputer - A Boot Camp

XSEDE New User Tutorial

The NCAR Yellowstone Data Centric Computing Environment. Rory Kelly ScicomP Workshop May 2013

MIGRATING TO THE SHARED COMPUTING CLUSTER (SCC) SCV Staff Boston University Scientific Computing and Visualization

Introduction to GALILEO

Name Department/Research Area Have you used the Linux command line?

Cerebro Quick Start Guide

The cluster system. Introduction 22th February Jan Saalbach Scientific Computing Group

Tech Computer Center Documentation

Knights Landing production environment on MARCONI

UAntwerpen, 24 June 2016

INTRODUCTION TO THE CLUSTER

Batch Systems. Running calculations on HPC resources

Batch Systems & Parallel Application Launchers Running your jobs on an HPC machine

Using Sapelo2 Cluster at the GACRC

Duke Compute Cluster Workshop. 3/28/2018 Tom Milledge rc.duke.edu

Exercise 1: Connecting to BW using ssh: NOTE: $ = command starts here, =means one space between words/characters.

Cheyenne NCAR s Next-Generation Data-Centric Supercomputing Environment

Using Cartesius and Lisa. Zheng Meyer-Zhao - Consultant Clustercomputing

KISTI TACHYON2 SYSTEM Quick User Guide

Table of Contents. Table of Contents Job Manager for remote execution of QuantumATK scripts. A single remote machine

Guillimin HPC Users Meeting March 17, 2016

NBIC TechTrack PBS Tutorial

Batch Systems. Running your jobs on an HPC machine

Computing with the Moore Cluster

Step 3: Access the HPC machine you will be using to run WRF: ocelote. Step 4: transfer downloaded WRF tar files to your home directory

Introduction to High Performance Computing Using Sapelo2 at GACRC

High Performance Computing (HPC) Club Training Session. Xinsheng (Shawn) Qin

Working on the NewRiver Cluster

NCAR Computation and Information Systems Laboratory (CISL) Facilities and Support Overview

Compiling applications for the Cray XC

First steps on using an HPC service ARCHER

Effective Use of CCV Resources

Graham vs legacy systems

NCAR s Data-Centric Supercomputing Environment Yellowstone. November 28, 2011 David L. Hart, CISL

NCAR s Data-Centric Supercomputing Environment Yellowstone. November 29, 2011 David L. Hart, CISL

PACE. Instructional Cluster Environment (ICE) Orientation. Mehmet (Memo) Belgin, PhD Research Scientist, PACE

Introduction to Discovery.

Introduction to High-Performance Computing (HPC)

Using the IBM Opteron 1350 at OSC. October 19-20, 2010

Introduction to High Performance Computing at UEA. Chris Collins Head of Research and Specialist Computing ITCS

Introduction to HPC Using the New Cluster at GACRC

TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 13 th CALL (T ier-0)

UF Research Computing: Overview and Running STATA

Introduction to Discovery.

HPCF Cray Phase 2. User Test period. Cristian Simarro User Support. ECMWF April 18, 2016

Introduction to HPC Numerical libraries on FERMI and PLX

Duke Compute Cluster Workshop. 11/10/2016 Tom Milledge h:ps://rc.duke.edu/

Guillimin HPC Users Meeting. Bart Oldeman

HPC Aaditya User Policies & Support

BRC HPC Services/Savio

SGI OpenFOAM TM Quick Start Guide

Supercomputing environment TMA4280 Introduction to Supercomputing

Quick Start Guide. by Burak Himmetoglu. Supercomputing Consultant. Enterprise Technology Services & Center for Scientific Computing

CS/Math 471: Intro. to Scientific Computing

Cluster Clonetroop: HowTo 2014

The JANUS Computing Environment

Guillimin HPC Users Meeting. Bryan Caron

Resources Current and Future Systems. Timothy H. Kaiser, Ph.D.

Introduction to HPC Using zcluster at GACRC

UoW HPC Quick Start. Information Technology Services University of Wollongong. ( Last updated on October 10, 2011)

PACE. Instructional Cluster Environment (ICE) Orientation. Research Scientist, PACE

Genius Quick Start Guide

CSC Supercomputing Environment

Introduction to the SHARCNET Environment May-25 Pre-(summer)school webinar Speaker: Alex Razoumov University of Ontario Institute of Technology

Running applications on the Cray XC30

High Performance Computing (HPC) Using zcluster at GACRC

Introduction to CINECA Computer Environment

New User Seminar: Part 2 (best practices)

Introduction to HPC Using zcluster at GACRC

Introduction to High-Performance Computing (HPC)

Martinos Center Compute Cluster

PBS Pro Documentation

Introduction to Unix Environment: modules, job scripts, PBS. N. Spallanzani (CINECA)

Duke Compute Cluster Workshop. 10/04/2018 Tom Milledge rc.duke.edu

NBIC TechTrack PBS Tutorial. by Marcel Kempenaar, NBIC Bioinformatics Research Support group, University Medical Center Groningen

Running Jobs, Submission Scripts, Modules

Introduction to the Cluster

Batch environment PBS (Running applications on the Cray XC30) 1/18/2016

Transitioning to Leibniz and CentOS 7

High Performance Beowulf Cluster Environment User Manual

Image Sharpening. Practical Introduction to HPC Exercise. Instructions for Cirrus Tier-2 System

Transcription:

Introduction to Cheyenne 12 January, 2017 Consulting Services Group Brian Vanderwende

Topics we will cover Technical specs of the Cheyenne supercomputer and expanded GLADE file systems The Cheyenne computing environment Accessing software on Cheyenne Compilers MPI/Parallelism Submitting batch jobs using the PBS scheduler Data storage Q&A

Cheyenne is an evolutionary increase from Yellowstone Yellowstone 1.5 petaflops peak compute 4536 dual-socket nodes 8-core Sandy Bridge 72,256 cores 145 TB total memory 32 GB per node 56 GB/s interconnects Cheyenne 5.34 petaflops peak compute 4032 dual-socket nodes 18-core Broadwell 145,152 cores 313 TB total memory 3164 nodes with 64 GB 864 nodes with 128 GB 100 GB/s interconnects 1 Yellowstone core-hour = 0.82 Cheyenne core-hours

The GLADE file systems will be expanded accordingly Will continue to use IBM GPFS/Spectrum Scale technology Existing capacity: 16 PB New/added capacity: 21 PB Total capacity of 37 PB, with potential for expansion to 58 PB in future upgrades More than 2x higher transfer rates on the new GLADE file spaces Home spaces (/glade/u ) are shared between Yellowstone and Cheyenne!

Timeline for HPC/Cheyenne 1. Test system (Laramie) in place since July 2. Cheyenne assembled in August 3. Cheyenne shipped to NWSC in September 4. NCAR acceptance on 2 January 2017 5. Start of production on Cheyenne: today! a. Accelerated Scientific Discovery (ASD) projects begin b. General user access in a couple of weeks i. Only around 5% of compute available during ASD period! ii. That period is a good time to do setup, testing, and porting 6. Yellowstone production ends: December 2017

Logging into the new systems As before, use your authentication token (yubikey) along with your username to login ssh -X -l username cheyenne.ucar.edu You will then be on one of six login nodes Your default shell is tcsh, but others are available through SAM SUSE Linux OS provides typical UNIX commands

The login nodes are a shared resource - use them lightly! As with Yellowstone, the six login nodes on Cheyenne are a shared space Your programs compete with those of 10-100s of other users for processing and memory So limit your usage to: Reading and writing text/code Compiling programs Performing small data transfers Interacting with the job scheduler Programs that use excessive resources on the login nodes will be terminated

CISL builds software for users to load with environment modules We build programs and libraries that you can enable by loading an environment module Compilers, MPI, NetCDF, MKL, Python, etc. Modules configure your computing environment so you can find binaries/executables, libraries and headers to compile with, and manuals to reference Modules are also used to prevent conflicting software from being loaded You don t need to use modules, but they simplify things greatly, and we recommend their use

Note that Yellowstone and Cheyenne each have their own module/software tree!

The Cheyenne module tree will add choice and clarity Yellowstone Cheyenne Compiler Intel GNU Software built with Intel MKL netcdf pnetcdf Compiler Intel 16.0.3 Intel 17.0.0 GNU 6.2.0 Intel 16.0.3 MKL netcdf MPI SGI MPT 2.15 Intel MPI 5.1.3.210 Intel 16.0.3 MPT 2.15 pnetcdf OpenMPI 10.2.0

Some useful module commands module add/remove/load/unload <software> module avail - show all community software installed on the system module list - show all software currently loaded within your environment module purge - clear your environment of all loaded software module save/restore <name> - create or load a saved set of software module show <software> - show the commands a module runs to configure your environment

Compiling software on Cheyenne We will support Intel, GCC, and PGI As on Yellowstone, wrapper scripts are loaded by default (ncarcompilers module) which make including code and linking to libraries much easier Building with netcdf using the wrappers: ifort model.f90 -o model Building with netcdf without the wrappers: setenv NETCDF /path/to/netcdf ifort -I${NETCDF}/include model.f90 -L${NETCDF}/lib -lnetcdff -o model Do not expect a parallel program compiled with one MPI library to run using a different library!

Where you compile code depends on where you intend to run it Cheyenne has newer Intel processors than Yellowstone and Caldera, which in turn have newer chips than Geyser If you must run a code across systems, either: 1. Compile for the oldest system you want to use, to ensure that results are consistent 2. For best performance, build executables separately for each system

To access compute resources, use the PBS job manager LSF (Yellowstone) PBS (Cheyenne) #!/bin/bash #BSUB -J WRF_PBS #BSUB -P <project> #BSUB -q regular #BSUB -W 30:00 #BSUB -n 144 #BSUB -R span[ptile=16] #BSUB -o log.oe #BSUB -e log.oe # Run WRF with IBM MPI mpirun.lsf./wrf.exe #!/bin/bash #PBS -N WRF_PBS #PBS -A <project> #PBS -q regular #PBS -l walltime=00:30:00 #PBS -l select=4:ncpus=36:mpiprocs=36 #PBS -j oe #PBS -o log.oe # Run WRF with SGI MPT mpiexec_mpt -n 144./wrf.exe

A (high-memory) shared queue will be available on Cheyenne Queue name Priority Wall clock (hours) Nodes Queue factor Description capability 1 12 1153-4032 1.0 Execution window: Midnight Friday to 6 a.m. Monday premium 1 12 1152 1.5 share 1 6 0.5 2.0 small 1.5 2 18 1.5 Interactive use for debugging and other tasks on a single, shared, 128-GB node. Interactive and batch use for testing, debugging, profiling; no production workloads. regular 2 12 1152 1.0 economy 3 12 1152 0.7 standby 4 12 1152 0.0 Do not submit to standby. Used when you have exceeded usage or allocation limits.

Submitting jobs to and querying information from PBS To submit a job to PBS, use qsub: Script: qsub job_script.pbs Interactive: qsub -I -l select=ncpus:36:mpiprocs:36 -l walltime=10:00 -q share -A <project> qstat <job_id> - query information about the job qstat -u $USER - summary of your active jobs qstat -Q <queue> - show status of specified or all queues qdel <job_id> - delete and/or kill the specified job It is not possible to search for backfill windows in PBS!

Using threads/openmp to exploit shared-memory parallelism Only OpenMP Hybrid MPI/OpenMP #!/bin/tcsh #PBS -N OPENMP #PBS -A <project> #PBS -q small #PBS -l walltime=10:00 #PBS -l select=1:ncpus=10 #PBS -j oe #PBS -o log.oe # Run program with 10 threads./executable_name #!/bin/tcsh #PBS -N HYBRID #PBS -A <project> #PBS -q small #PBS -l walltime=10:00 #PBS -l select=2:ncpus=36:mpiprocs=1:ompthreads=36 #PBS -j oe #PBS -o log.oe ### Make sure threads are distributed across the node setenv MPI_OPENMP_INTEROP 1 # Run program with one MPI task and 36 OpenMP # threads per node (two nodes) mpiexec_mpt./executable_name

Pinning threads to CPUs with SGI MPT s omplace command Normally threads will migrate across available CPUs throughout execution Sometimes it is advantageous to pin threads to a particular CPU (e.g., OpenMP across a socket) #PBS -l select=2:ncpus=36:mpiprocs=2:ompthreads=18 # Need to turn off Intel affinity management as it interferes with omplace setenv KMP_AFFINITY disabled # Run program with one MPI task and 18 OpenMP threads per socket # (two per node with two nodes) mpiexec_mpt omplace./executable_name

Managing your compute time allocation After compiling a program, try running small test jobs before your large simulation For single core jobs, use the share queue, to avoid being charged for unused core-hours: Exclusive: wall-clock hours nodes used 36 cores per node queue factor Shared: core-seconds/3600 queue factor Use the DAV clusters for NCL, Python, MATLAB, and R scripts and interactive visualization (VAPOR)

How to store data on Cheyenne File space Quota Data Safety Description Home /glade/u/home/$user 25 GB Backups & Snapshots Store settings, code, and other valuables Work /glade/p/work/$user 512 GB Stable but no backups Good place for keeping run directories and input data Project /glade/p/project Varies Stable but no backups HPSS hsi -> /home/$user TB/yr Charge Stable but no backups Storage limits depend on your allocation, data cannot be used interactively Scratch /glade/scratch/$user 10 TB At-risk! Purged! Use as temporary data storage only; manually back up files (e.g., to HPSS)

Yellowstone users will have two scratch spaces for a short time Phase 1 - very soon Phase 2 - weeks later Yellowstone /glade/scratch Yellowstone /glade/scratch_old 5 PB GLADE1 5 PB READ-ONLY! /glade/scratch_cheyenne /glade/scratch Cheyenne Cheyenne /glade/scratch /glade/scratch 15 PB GLADE2 15 PB GLADE2 Files will still be purged on scratch_old, so move them elsewhere!

Storage tips Keep track of your allocations using gladequota Archive large numbers of small files to limit wasted space on GLADE spaces If data is not needed for immediate access, move to the HPSS tape archive: hsi cput <filename> hsi cget <filename> Large collections of files can be combined while transferring to HPSS using HTAR. Efficient! htar -cvf <archive.tar> <directory>

The future of the DAV systems Geyser and Caldera will continue to serve as data analysis and visualization machines Integration with Cheyenne is still TBD Current plan is to make 4 of the 12 Geyser nodes available within Cheyenne using the SLURM scheduler Caldera will likely be accessible only from Yellowstone In early stages of a procurement for a Geyser replacement and a many-core system Production target: 2018

Things to keep in mind... Yellowstone, Geyser, and Caldera will continue to run the LSF scheduler. Keep your job scripts organized. Access to file systems from both clusters should make data management easier, but pay attention to where you have compiled programs. If you want to configure settings in your startup files for Yellowstone and Cheyenne, you should make sure they only run on that system...

How to make.tcshrc/.profile machine specific ~/.tcshrc ~/.profile (bash) tty > /dev/null if ( $status == 0 ) then alias rm rm -i set prompt = "%n@%m:%~" if ( $HOSTNAME =~ yslogin* ) then # Yellowstone settings alias bstat bjobs -u all else # Cheyenne settings alias qjobs qstat -u $USER endif endif alias rm= rm -i PS1="\u@\h:\w> " if [[ $HOSTNAME == yslogin* ]]; then # Yellowstone settings alias bstat bjobs -u all source.profile-ys else fi # Cheyenne settings alias qjobs= qstat -u $USER source.profile-ch

CISL Helpdesk/Consulting https://www2.cisl.ucar.edu/user-support/getting-help Walk-in: ML 1B Suite 55 Email: cislhelp@ucar.edu Phone: 303-497-2400 Specific questions from today and/or feedback: Email: vanderwb@ucar.edu For science questions (e.g., running CESM/WRF), consult relevant support resources