Introduction to Cheyenne. 12 January, 2017 Consulting Services Group Brian Vanderwende

Introduction to Cheyenne 12 January, 2017 Consulting Services Group Brian Vanderwende

Topics we will cover Technical specs of the Cheyenne supercomputer and expanded GLADE file systems The Cheyenne computing environment Accessing software on Cheyenne Compilers MPI/Parallelism Submitting batch jobs using the PBS scheduler Data storage Q&A

Cheyenne is an evolutionary increase from Yellowstone Yellowstone 1.5 petaflops peak compute 4536 dual-socket nodes 8-core Sandy Bridge 72,256 cores 145 TB total memory 32 GB per node 56 GB/s interconnects Cheyenne 5.34 petaflops peak compute 4032 dual-socket nodes 18-core Broadwell 145,152 cores 313 TB total memory 3164 nodes with 64 GB 864 nodes with 128 GB 100 GB/s interconnects 1 Yellowstone core-hour = 0.82 Cheyenne core-hours

The GLADE file systems will be expanded accordingly Will continue to use IBM GPFS/Spectrum Scale technology Existing capacity: 16 PB New/added capacity: 21 PB Total capacity of 37 PB, with potential for expansion to 58 PB in future upgrades More than 2x higher transfer rates on the new GLADE file spaces Home spaces (/glade/u ) are shared between Yellowstone and Cheyenne!

Timeline for HPC/Cheyenne 1. Test system (Laramie) in place since July 2. Cheyenne assembled in August 3. Cheyenne shipped to NWSC in September 4. NCAR acceptance on 2 January 2017 5. Start of production on Cheyenne: today! a. Accelerated Scientific Discovery (ASD) projects begin b. General user access in a couple of weeks i. Only around 5% of compute available during ASD period! ii. That period is a good time to do setup, testing, and porting 6. Yellowstone production ends: December 2017

Logging into the new systems As before, use your authentication token (yubikey) along with your username to login ssh -X -l username cheyenne.ucar.edu You will then be on one of six login nodes Your default shell is tcsh, but others are available through SAM SUSE Linux OS provides typical UNIX commands

The login nodes are a shared resource - use them lightly! As with Yellowstone, the six login nodes on Cheyenne are a shared space Your programs compete with those of 10-100s of other users for processing and memory So limit your usage to: Reading and writing text/code Compiling programs Performing small data transfers Interacting with the job scheduler Programs that use excessive resources on the login nodes will be terminated

CISL builds software for users to load with environment modules We build programs and libraries that you can enable by loading an environment module Compilers, MPI, NetCDF, MKL, Python, etc. Modules configure your computing environment so you can find binaries/executables, libraries and headers to compile with, and manuals to reference Modules are also used to prevent conflicting software from being loaded You don t need to use modules, but they simplify things greatly, and we recommend their use

Note that Yellowstone and Cheyenne each have their own module/software tree!

The Cheyenne module tree will add choice and clarity Yellowstone Cheyenne Compiler Intel GNU Software built with Intel MKL netcdf pnetcdf Compiler Intel 16.0.3 Intel 17.0.0 GNU 6.2.0 Intel 16.0.3 MKL netcdf MPI SGI MPT 2.15 Intel MPI 5.1.3.210 Intel 16.0.3 MPT 2.15 pnetcdf OpenMPI 10.2.0

Some useful module commands module add/remove/load/unload <software> module avail - show all community software installed on the system module list - show all software currently loaded within your environment module purge - clear your environment of all loaded software module save/restore <name> - create or load a saved set of software module show <software> - show the commands a module runs to configure your environment

Compiling software on Cheyenne We will support Intel, GCC, and PGI As on Yellowstone, wrapper scripts are loaded by default (ncarcompilers module) which make including code and linking to libraries much easier Building with netcdf using the wrappers: ifort model.f90 -o model Building with netcdf without the wrappers: setenv NETCDF /path/to/netcdf ifort -I${NETCDF}/include model.f90 -L${NETCDF}/lib -lnetcdff -o model Do not expect a parallel program compiled with one MPI library to run using a different library!

Where you compile code depends on where you intend to run it Cheyenne has newer Intel processors than Yellowstone and Caldera, which in turn have newer chips than Geyser If you must run a code across systems, either: 1. Compile for the oldest system you want to use, to ensure that results are consistent 2. For best performance, build executables separately for each system

To access compute resources, use the PBS job manager LSF (Yellowstone) PBS (Cheyenne) #!/bin/bash #BSUB -J WRF_PBS #BSUB -P <project> #BSUB -q regular #BSUB -W 30:00 #BSUB -n 144 #BSUB -R span[ptile=16] #BSUB -o log.oe #BSUB -e log.oe # Run WRF with IBM MPI mpirun.lsf./wrf.exe #!/bin/bash #PBS -N WRF_PBS #PBS -A <project> #PBS -q regular #PBS -l walltime=00:30:00 #PBS -l select=4:ncpus=36:mpiprocs=36 #PBS -j oe #PBS -o log.oe # Run WRF with SGI MPT mpiexec_mpt -n 144./wrf.exe

A (high-memory) shared queue will be available on Cheyenne Queue name Priority Wall clock (hours) Nodes Queue factor Description capability 1 12 1153-4032 1.0 Execution window: Midnight Friday to 6 a.m. Monday premium 1 12 1152 1.5 share 1 6 0.5 2.0 small 1.5 2 18 1.5 Interactive use for debugging and other tasks on a single, shared, 128-GB node. Interactive and batch use for testing, debugging, profiling; no production workloads. regular 2 12 1152 1.0 economy 3 12 1152 0.7 standby 4 12 1152 0.0 Do not submit to standby. Used when you have exceeded usage or allocation limits.

Submitting jobs to and querying information from PBS To submit a job to PBS, use qsub: Script: qsub job_script.pbs Interactive: qsub -I -l select=ncpus:36:mpiprocs:36 -l walltime=10:00 -q share -A <project> qstat <job_id> - query information about the job qstat -u $USER - summary of your active jobs qstat -Q <queue> - show status of specified or all queues qdel <job_id> - delete and/or kill the specified job It is not possible to search for backfill windows in PBS!

Using threads/openmp to exploit shared-memory parallelism Only OpenMP Hybrid MPI/OpenMP #!/bin/tcsh #PBS -N OPENMP #PBS -A <project> #PBS -q small #PBS -l walltime=10:00 #PBS -l select=1:ncpus=10 #PBS -j oe #PBS -o log.oe # Run program with 10 threads./executable_name #!/bin/tcsh #PBS -N HYBRID #PBS -A <project> #PBS -q small #PBS -l walltime=10:00 #PBS -l select=2:ncpus=36:mpiprocs=1:ompthreads=36 #PBS -j oe #PBS -o log.oe ### Make sure threads are distributed across the node setenv MPI_OPENMP_INTEROP 1 # Run program with one MPI task and 36 OpenMP # threads per node (two nodes) mpiexec_mpt./executable_name

Pinning threads to CPUs with SGI MPT s omplace command Normally threads will migrate across available CPUs throughout execution Sometimes it is advantageous to pin threads to a particular CPU (e.g., OpenMP across a socket) #PBS -l select=2:ncpus=36:mpiprocs=2:ompthreads=18 # Need to turn off Intel affinity management as it interferes with omplace setenv KMP_AFFINITY disabled # Run program with one MPI task and 18 OpenMP threads per socket # (two per node with two nodes) mpiexec_mpt omplace./executable_name

Managing your compute time allocation After compiling a program, try running small test jobs before your large simulation For single core jobs, use the share queue, to avoid being charged for unused core-hours: Exclusive: wall-clock hours nodes used 36 cores per node queue factor Shared: core-seconds/3600 queue factor Use the DAV clusters for NCL, Python, MATLAB, and R scripts and interactive visualization (VAPOR)

How to store data on Cheyenne File space Quota Data Safety Description Home /glade/u/home/$user 25 GB Backups & Snapshots Store settings, code, and other valuables Work /glade/p/work/$user 512 GB Stable but no backups Good place for keeping run directories and input data Project /glade/p/project Varies Stable but no backups HPSS hsi -> /home/$user TB/yr Charge Stable but no backups Storage limits depend on your allocation, data cannot be used interactively Scratch /glade/scratch/$user 10 TB At-risk! Purged! Use as temporary data storage only; manually back up files (e.g., to HPSS)

Yellowstone users will have two scratch spaces for a short time Phase 1 - very soon Phase 2 - weeks later Yellowstone /glade/scratch Yellowstone /glade/scratch_old 5 PB GLADE1 5 PB READ-ONLY! /glade/scratch_cheyenne /glade/scratch Cheyenne Cheyenne /glade/scratch /glade/scratch 15 PB GLADE2 15 PB GLADE2 Files will still be purged on scratch_old, so move them elsewhere!

Storage tips Keep track of your allocations using gladequota Archive large numbers of small files to limit wasted space on GLADE spaces If data is not needed for immediate access, move to the HPSS tape archive: hsi cput <filename> hsi cget <filename> Large collections of files can be combined while transferring to HPSS using HTAR. Efficient! htar -cvf <archive.tar> <directory>

The future of the DAV systems Geyser and Caldera will continue to serve as data analysis and visualization machines Integration with Cheyenne is still TBD Current plan is to make 4 of the 12 Geyser nodes available within Cheyenne using the SLURM scheduler Caldera will likely be accessible only from Yellowstone In early stages of a procurement for a Geyser replacement and a many-core system Production target: 2018

Things to keep in mind... Yellowstone, Geyser, and Caldera will continue to run the LSF scheduler. Keep your job scripts organized. Access to file systems from both clusters should make data management easier, but pay attention to where you have compiled programs. If you want to configure settings in your startup files for Yellowstone and Cheyenne, you should make sure they only run on that system...

How to make.tcshrc/.profile machine specific ~/.tcshrc ~/.profile (bash) tty > /dev/null if ( $status == 0 ) then alias rm rm -i set prompt = "%n@%m:%~" if ( $HOSTNAME =~ yslogin* ) then # Yellowstone settings alias bstat bjobs -u all else # Cheyenne settings alias qjobs qstat -u $USER endif endif alias rm= rm -i PS1="\u@\h:\w> " if [[ $HOSTNAME == yslogin* ]]; then # Yellowstone settings alias bstat bjobs -u all source.profile-ys else fi # Cheyenne settings alias qjobs= qstat -u $USER source.profile-ch

CISL Helpdesk/Consulting https://www2.cisl.ucar.edu/user-support/getting-help Walk-in: ML 1B Suite 55 Email: cislhelp@ucar.edu Phone: 303-497-2400 Specific questions from today and/or feedback: Email: vanderwb@ucar.edu For science questions (e.g., running CESM/WRF), consult relevant support resources