Introduction to SLURM & SLURM batch scripts

Similar documents
Introduction to SLURM & SLURM batch scripts

Introduction to SLURM & SLURM batch scripts

Introduction to Modules at CHPC

Introduction to High-Performance Computing (HPC)

Slurm basics. Summer Kickstart June slide 1 of 49

Duke Compute Cluster Workshop. 3/28/2018 Tom Milledge rc.duke.edu

Duke Compute Cluster Workshop. 11/10/2016 Tom Milledge h:ps://rc.duke.edu/

Slurm and Abel job scripts. Katerina Michalickova The Research Computing Services Group SUF/USIT November 13, 2013

High Performance Computing Cluster Basic course

Introduction to SLURM on the High Performance Cluster at the Center for Computational Research

Submitting and running jobs on PlaFRIM2 Redouane Bouchouirbat

Introduction to High-Performance Computing (HPC)

June Workshop Series June 27th: All About SLURM University of Nebraska Lincoln Holland Computing Center. Carrie Brown, Adam Caprez

Introduction to the NCAR HPC Systems. 25 May 2018 Consulting Services Group Brian Vanderwende

Introduction to Joker Cyber Infrastructure Architecture Team CIA.NMSU.EDU

Exercises: Abel/Colossus and SLURM

Slurm and Abel job scripts. Katerina Michalickova The Research Computing Services Group SUF/USIT October 23, 2012

Using a Linux System 6

High Performance Computing Cluster Advanced course

Duke Compute Cluster Workshop. 10/04/2018 Tom Milledge rc.duke.edu

CENTER FOR HIGH PERFORMANCE COMPUTING. Overview of CHPC. Martin Čuma, PhD. Center for High Performance Computing

Introduction to RCC. September 14, 2016 Research Computing Center

CRUK cluster practical sessions (SLURM) Part I processes & scripts

Protected Environment at CHPC

Introduction to RCC. January 18, 2017 Research Computing Center

Heterogeneous Job Support

How to access Geyser and Caldera from Cheyenne. 19 December 2017 Consulting Services Group Brian Vanderwende

Working with Shell Scripting. Daniel Balagué

Batch Usage on JURECA Introduction to Slurm. May 2016 Chrysovalantis Paschoulas HPS JSC

Bash for SLURM. Author: Wesley Schaal Pharmaceutical Bioinformatics, Uppsala University

Kamiak Cheat Sheet. Display text file, one page at a time. Matches all files beginning with myfile See disk space on volume

Protected Environment at CHPC

SLURM: Resource Management and Job Scheduling Software. Advanced Computing Center for Research and Education

Sherlock for IBIIS. William Law Stanford Research Computing

Training day SLURM cluster. Context Infrastructure Environment Software usage Help section SLURM TP For further with SLURM Best practices Support TP

Introduction to High Performance Computing at Case Western Reserve University. KSL Data Center

SCALABLE HYBRID PROTOTYPE

Batch Systems. Running your jobs on an HPC machine

Submitting batch jobs Slurm on ecgate Solutions to the practicals

An Introduction to Gauss. Paul D. Baines University of California, Davis November 20 th 2012

ECE 574 Cluster Computing Lecture 4

CNAG Advanced User Training

Batch Systems & Parallel Application Launchers Running your jobs on an HPC machine

Graham vs legacy systems

Introduction to GACRC Teaching Cluster PHYS8602

RHRK-Seminar. High Performance Computing with the Cluster Elwetritsch - II. Course instructor : Dr. Josef Schüle, RHRK

Introduction to the Cluster

Introduction to GACRC Teaching Cluster

Submitting batch jobs

P a g e 1. HPC Example for C with OpenMPI

Introduction to GACRC Teaching Cluster

How to run a job on a Cluster?

Managing and Deploying GPU Accelerators. ADAC17 - Resource Management Stephane Thiell and Kilian Cavalotti Stanford Research Computing Center

OPEN MP and MPI on Kingspeak chpc cluster

Stampede User Environment

HPC Introductory Course - Exercises

Introduction to UBELIX

Introduction to HPC Using zcluster at GACRC

Name Department/Research Area Have you used the Linux command line?

TITANI CLUSTER USER MANUAL V.1.3

For Dr Landau s PHYS8602 course

Training day SLURM cluster. Context. Context renewal strategy

Choosing Resources Wisely. What is Research Computing?

Protected Environment at CHPC. Sean Igo Center for High Performance Computing September 11, 2014

Using the SLURM Job Scheduler

Using the computational resources at the GACRC

Choosing Resources Wisely Plamen Krastev Office: 38 Oxford, Room 117 FAS Research Computing

Scientific Computing in practice

Beginner's Guide for UK IBM systems

Parallel Merge Sort Using MPI

Compiling applications for the Cray XC

Using CSC Environment Efficiently,

Programming Environment

Submitting batch jobs Slurm on ecgate

Part One: The Files. C MPI Slurm Tutorial - Hello World. Introduction. Hello World! hello.tar. The files, summary. Output Files, summary

How to Use a Supercomputer - A Boot Camp

High Performance Computing (HPC) Using zcluster at GACRC

To connect to the cluster, simply use a SSH or SFTP client to connect to:

Introduction to HPC Using zcluster at GACRC

Power9 CTE User s Guide

Applications Software Example

Kimmo Mattila Ari-Matti Sarén. CSC Bioweek Computing intensive bioinformatics analysis on Taito

Introduction to HPC Using zcluster at GACRC

Workstations & Thin Clients

Job Management on LONI and LSU HPC clusters

Introduction to GALILEO

Introduction to NCAR HPC. 25 May 2017 Consulting Services Group Brian Vanderwende

Introduction to HPC Using zcluster at GACRC On-Class GENE 4220

STARTING THE DDT DEBUGGER ON MIO, AUN, & MC2. (Mouse over to the left to see thumbnails of all of the slides)

Description of Power8 Nodes Available on Mio (ppc[ ])

Introduction to PICO Parallel & Production Enviroment

Introduction to HPC Using the New Cluster at GACRC

Using Cartesius and Lisa. Zheng Meyer-Zhao - Consultant Clustercomputing

Introduction to CINECA Computer Environment

Linux Tutorial. Ken-ichi Nomura. 3 rd Magics Materials Software Workshop. Gaithersburg Marriott Washingtonian Center November 11-13, 2018

XSEDE New User Training. Ritu Arora November 14, 2014

Using Compute Canada. Masao Fujinaga Information Services and Technology University of Alberta

Getting started with the CEES Grid

Introduction to Linux Part 2b: basic scripting. Brett Milash and Wim Cardoen CHPC User Services 18 January, 2018

Quick Start Guide. by Burak Himmetoglu. Supercomputing Consultant. Enterprise Technology Services & Center for Scientific Computing

Transcription:

Introduction to SLURM & SLURM batch scripts Anita Orendt Assistant Director Research Consulting & Faculty Engagement anita.orendt@utah.edu 16 Feb 2017

Overview of Talk Basic SLURM commands SLURM batch directives Accounts and Partitions SLURM Environment Variables SLURM Batch scripts Running an Interactive Batch job Where to get more Information 2/15/2017 http://www.chpc.utah.edu

Basic SLURM commands sinfo - shows partition/node state sbatch <scriptname> - launches a batch script squeue - shows all jobs in the queue squeue -u <username> - shows only your jobs scancel <jobid> - cancels a job 2/15/2017 http://www.chpc.utah.edu Slide 3

Tcsh to add to.aliases file: #SLURM Aliases that provide information in a useful manner for our clusters alias si 'sinfo -o "%20P %5D %14F %8z %10m %11l %16f %N"' alias si2 'sinfo -o "%20P %5D %6t %8z %10m %10d %11l %N"' alias sq 'squeue -o "%8i %12j %4t %10u %20q %20a %10g %20P %10Q %5D %11l %11L %R"' Bash to add to.aliases file: CENTER FOR HIGH PERFORMANCE COMPUTING Some Useful Aliases alias si="sinfo -o \"%20P %5D %14F %8z %10m %10d %11l %16f %N\"" alias si2="sinfo -o \"%20P %5D %6t %8z %10m %10d %11l %16f %N\"" alias sq="squeue -o \"%8i %12j %4t %10u %20q %20a %10g %20P %10Q %5D %11l %11L %R\"" Tcsh to add to.aliases file: alias si 'sinfo -o "%20P %5D %14F %8z %10m %11l %16f %N"' alias si2 'sinfo -o "%20P %5D %6t %8z %10m %10d %11l %N"' alias sq 'squeue -o "%8i %12j %4t %10u %20q %20a %10g %20P %10Q %5D %11l %11L %R"' 2/15/2017 http://www.chpc.utah.edu Slide 4

SLURM Batch Directives #SBATCH --time 1:00:00 wall time of a job (or -t) #SBATCH --partition=name partition to use (or -p) #SBATCH --account=name account to use (or -A) #SBATCH --nodes=2 number of nodes (or -N) #SBATCH --ntasks 12 total number of tasks (or -n) #SBATCH --mail-type=fail,begin,end events on which to send email #SBATCH --mail-user=name@example.com email address to use #SBATCH -o slurm-%j.out-%n name for stdout; %j is job#, %N node #SBATCH -e slurm-%j.err-%n name for stderr; %j is job#, %N node #SBATCH --constraint C20 can use features given for nodes (or C) 2/15/2017 http://www.chpc.utah.edu Slide 5

Accounts and Partitions You need to specify an account and partition to run jobs You can see a list of partitions using the sinfo command For general allocation usage the partition is the cluster name If no allocation (or out of allocation) use clustername-freecycle for partition Your account is typically your PI s name (e.g., if my PI is Baggins, I use the "baggins" account) there are a few exceptions! Private node accounts and partition have the same name PI last name with cluster abbreviation, e.g., baggins-kp, baggins-em, etc Private nodes can be used as a guest using the "owner-guest" account and the cluster-guest partition 2/15/2017 http://www.chpc.utah.edu Slide 6

SLURM Environment Variables Depend on SLURM Batch Directives used Can get them for a given set of directives by using the env command inside a script (or in a srun session). Some useful env variables: $SLURM_JOBID $SLURM_SUBMIT_DIR $SLURM_NNODES $SLURM_NTASKS 2/15/2017 http://www.chpc.utah.edu Slide 7

Basic SLURM script flow 1. Set up the #SBATCH directives for the scheduler to request resources for job 2. Set up the working environment, by loading appropriate modules 3. If necessary, add any additional libraries or programs to $PATH and $LD_LIBRARY_PATH, or set other environment needs 4. Set up temporary/scratch directories if needed 5. Switch to the working directory 6. Run the program 7. Clean up any temporary files or directories 2/15/2017 http://www.chpc.utah.edu Slide 8

Basic SLURM script - bash #!/bin/bash #SBATCH --time=02:00:00 #SBATCH --nodes=1 #SBATCH -o slurmjob-%j #SBATCH --account=owner-guest #SBATCH --partition=kingspeak-guest #Set up whatever package we need to run with module load somemodule #set up the temporary directory SCRDIR=/scratch/general/lustre/$USER/$SLURM_JOBID mkdir -p $SCRDIR #Set up the path to the working directory pp file.input $SCRDIR/. cd $SCRDIR #Run the program with our input myprogram < file.input > file.output cp file.output $HOME/. cd $HOME rm -rf $SCRDIR 2/15/2017 http://www.chpc.utah.edu Slide 9

Basic SLURM script - tcsh #!/bin/tcsh #SBATCH --time=02:00:00 #SBATCH --nodes=1 #SBATCH -o slurmjob-%j #SBATCH --account=owner-guest #SBATCH --partition=kingspeak-guest #Set up whatever package we need to run with module load somemodule #set up the scratch directory set SCRDIR /scratch/local/$user/$slurm_jobid mkdir -P $SCRDIR #move input files into scratch directory cp file.input $SCRDIR/. cd $SCRDIR #Run the program with our input myprogram < file.input > file.output cp file.output $HOME/. cd $HOME rm -rf $SCRDIR 2/15/2017 http://www.chpc.utah.edu Slide 10

Parallel Execution If needed, create the node list: srun hostname sort -u > nodefile.$slurm_jobid srun hostname sort > nodefile.$slurm_jobid MPI installations at CHPC are SLURM aware, so mpirun will work without a machinefile (unless you are manipulating the machinefile in your scripts) Alternatively, you can use the srun command instead, but you need to compile with a more recently compiled MPI Mileage may vary, and for different MPI distributions, srun or mpirun may be preferred (check our slurm page on the CHPC website for more info or email us) 2/15/2017 http://www.chpc.utah.edu Slide 11

Running interactive batch jobs An interactive command is launched through the srun command srun --time=1:00:00 --nodes=1 --account=chpc --partition=ember --pty /bin/tcsh -l Launching an interactive job automatically forwards environment information, including X11 forwarding ( "--pty" must be set to shell preferred for the session (either /bin/tcsh or /bin/bash) -l (lower case L ) at the end required 2/15/2017 http://www.chpc.utah.edu Slide 12

Slurm for use of GPU Nodes Ember 11 GPU nodes Each node with M2090 cards Kingspeak 4 GPU nodes 2 nodes with 4 Tesla K80 cards (8 GPUs) 2 nodes with 8 GeForce TitanX cards Use partition and account set to cluster-gpu Must get added to account request via issues@chpc.utah.edu Use only if you are making use of the GPU for the calculation Most codes do not yet make efficient use of multiple GPUs so we have enabled node sharing See https://www.chpc.utah.edu/documentation/guides/gpus-accelerators.php 2/15/2017 http://www.chpc.utah.edu Slide 13

Node Sharing (currently only on GPU nodes) Need to specify number of CPU cores, amount of memory, and number of GPU Option #SBATCH --gres=gpu:k80:1 #SBATCH --mem=4g #SBATCH --mem=0 #SBATCH --tasks=1 Explanation request one K80 GPU (others types names are titanx, m2090) request 4 GB of RAM request all memory of the node; use this if you do not want to share the node as this will give you all the memory requests 1 core 2/15/2017 http://www.chpc.utah.edu Slide 14

Slurm Documentation at CHPC https://www.chpc.utah.edu/documentation/software/sl urm.php Other good documentation sources http://slurm.schedmd.com/documentation.html http://slurm.schedmd.com/pdfs/summary.pdf http://www.schedmd.com/slurmdocs/rosetta.pdf 2/15/2017 http://www.chpc.utah.edu Slide 15

CHPC website and wiki www.chpc.utah.edu Getting Help Getting started guide, cluster usage guides, software manual pages, CHPC policies Jira Ticketing System Email: issues@chpc.utah.edu Help Desk: 405 INSCC, 581-6440 (9-5 M-F) We use chpc-hpc-users@lists.utah.edu for sending messages to users; also have Twitter accounts for announcements -- @CHPCOutages & @CHPCUpdates 2/15/2017 http://www.chpc.utah.edu Slide 16