Exercises: Abel/Colossus and SLURM

Similar documents
Slurm and Abel job scripts. Katerina Michalickova The Research Computing Services Group SUF/USIT October 23, 2012

Slurm and Abel job scripts. Katerina Michalickova The Research Computing Services Group SUF/USIT November 13, 2013

Introduction to Abel/Colossus and the queuing system

Introduction to Joker Cyber Infrastructure Architecture Team CIA.NMSU.EDU

Heterogeneous Job Support

Slurm basics. Summer Kickstart June slide 1 of 49

Submitting and running jobs on PlaFRIM2 Redouane Bouchouirbat

Introduction to SLURM on the High Performance Cluster at the Center for Computational Research

Introduction to SLURM & SLURM batch scripts

Introduction to SLURM & SLURM batch scripts

High Performance Computing Cluster Basic course

Introduction to High-Performance Computing (HPC)

Sherlock for IBIIS. William Law Stanford Research Computing

How to run a job on a Cluster?

For Dr Landau s PHYS8602 course

High Performance Computing Cluster Advanced course

HPC Introductory Course - Exercises

Graham vs legacy systems

Duke Compute Cluster Workshop. 10/04/2018 Tom Milledge rc.duke.edu

Duke Compute Cluster Workshop. 3/28/2018 Tom Milledge rc.duke.edu

Introduction to the NCAR HPC Systems. 25 May 2018 Consulting Services Group Brian Vanderwende

Using a Linux System 6

Introduction to RCC. September 14, 2016 Research Computing Center

Introduction to RCC. January 18, 2017 Research Computing Center

Introduction to SLURM & SLURM batch scripts

Submitting batch jobs Slurm on ecgate Solutions to the practicals

Duke Compute Cluster Workshop. 11/10/2016 Tom Milledge h:ps://rc.duke.edu/

Submitting batch jobs Slurm on ecgate

Introduction to GACRC Teaching Cluster

Submitting batch jobs

How to access Geyser and Caldera from Cheyenne. 19 December 2017 Consulting Services Group Brian Vanderwende

Using CSC Environment Efficiently,

Introduction to GACRC Teaching Cluster PHYS8602

June Workshop Series June 27th: All About SLURM University of Nebraska Lincoln Holland Computing Center. Carrie Brown, Adam Caprez

Introduction to UBELIX

TITANI CLUSTER USER MANUAL V.1.3

CNAG Advanced User Training

LAB. Preparing for Stampede: Programming Heterogeneous Many-Core Supercomputers

P a g e 1. HPC Example for C with OpenMPI

Batch Usage on JURECA Introduction to Slurm. May 2016 Chrysovalantis Paschoulas HPS JSC

RHRK-Seminar. High Performance Computing with the Cluster Elwetritsch - II. Course instructor : Dr. Josef Schüle, RHRK

Kamiak Cheat Sheet. Display text file, one page at a time. Matches all files beginning with myfile See disk space on volume

Introduction to GACRC Teaching Cluster

To connect to the cluster, simply use a SSH or SFTP client to connect to:

Using Compute Canada. Masao Fujinaga Information Services and Technology University of Alberta

CRUK cluster practical sessions (SLURM) Part I processes & scripts

Kimmo Mattila Ari-Matti Sarén. CSC Bioweek Computing intensive bioinformatics analysis on Taito

SLURM: Resource Management and Job Scheduling Software. Advanced Computing Center for Research and Education

Introduction to the Cluster

Training day SLURM cluster. Context. Context renewal strategy

Name Department/Research Area Have you used the Linux command line?

A Hands-On Tutorial: RNA Sequencing Using High-Performance Computing

How to Use a Supercomputer - A Boot Camp

ECE 574 Cluster Computing Lecture 4

Scheduling By Trackable Resources

1 Bull, 2011 Bull Extreme Computing

SCALABLE HYBRID PROTOTYPE

Table of Contents. Table of Contents Job Manager for remote execution of QuantumATK scripts. A single remote machine

Introduction to HPC2N

Applications Software Example

Bash for SLURM. Author: Wesley Schaal Pharmaceutical Bioinformatics, Uppsala University

Scientific Computing in practice

Training day SLURM cluster. Context Infrastructure Environment Software usage Help section SLURM TP For further with SLURM Best practices Support TP

Before We Start. Sign in hpcxx account slips Windows Users: Download PuTTY. Google PuTTY First result Save putty.exe to Desktop

Using CSC Environment Efficiently,

An Introduction to Gauss. Paul D. Baines University of California, Davis November 20 th 2012

Power9 CTE User s Guide

Beginner's Guide for UK IBM systems

Batch Systems & Parallel Application Launchers Running your jobs on an HPC machine

Supercomputing environment TMA4280 Introduction to Supercomputing

OBTAINING AN ACCOUNT:

Our Workshop Environment

Introduction to NCAR HPC. 25 May 2017 Consulting Services Group Brian Vanderwende

Introduction to HPC Using zcluster at GACRC

HPC Workshop. Nov. 9, 2018 James Coyle, PhD Dir. Of High Perf. Computing

LaPalma User's Guide

Working with Shell Scripting. Daniel Balagué

Slurm at UPPMAX. How to submit jobs with our queueing system. Jessica Nettelblad sysadmin at UPPMAX

Compiling applications for the Cray XC

Linux Tutorial. Ken-ichi Nomura. 3 rd Magics Materials Software Workshop. Gaithersburg Marriott Washingtonian Center November 11-13, 2018

Description of Power8 Nodes Available on Mio (ppc[ ])

Choosing Resources Wisely. What is Research Computing?

Számítogépes modellezés labor (MSc)

STARTING THE DDT DEBUGGER ON MIO, AUN, & MC2. (Mouse over to the left to see thumbnails of all of the slides)

Student HPC Hackathon 8/2018

Brigham Young University

Slurm Burst Buffer Support

Basic UNIX commands. HORT Lab 2 Instructor: Kranthi Varala

Introduction to High-Performance Computing (HPC)

Introduction to High Performance Computing at Case Western Reserve University. KSL Data Center

COSC 6374 Parallel Computation. Debugging MPI applications. Edgar Gabriel. Spring 2008

UPPMAX Introduction Martin Dahlö Valentin Georgiev

Using Sapelo2 Cluster at the GACRC

Migrating from Zcluster to Sapelo

Symmetric Computing. SC 14 Jerome VIENNE

Choosing Resources Wisely Plamen Krastev Office: 38 Oxford, Room 117 FAS Research Computing

Using the SLURM Job Scheduler

Parallel Merge Sort Using MPI

LAB. Preparing for Stampede: Programming Heterogeneous Many- Core Supercomputers

Managing and Deploying GPU Accelerators. ADAC17 - Resource Management Stephane Thiell and Kilian Cavalotti Stanford Research Computing Center

Transcription:

Exercises: Abel/Colossus and SLURM November 08, 2016 Sabry Razick The Research Computing Services Group, USIT

Topics Get access Running a simple job Job script Running a simple job -- qlogin Customize batch script Parallel jobs

Log into Abel If on Windows, download Gitk/Putty for connecting WinSCP for copying files On Unix systems, open a terminal and type: ssh <username>@abel.uio.no Use your UiO/Nortur username and password.

File location /cluster/teaching/abel_tutorial/nov2016/hello.slurm

Running a job on a laptop Double click on an icon, give parameters or upload data -wait Terminal./myprg input1.txt putputname Inspect results

Running a job on the cluster -1 SLURM ABEL Login to Abel from you laptop Request to occupy some resources from SLURM Wait until SLURM grant you the resources Execute the job as it was in your laptop

Running a job on the cluster - 2 SLURM ABEL Login to Abel from you laptop Create a job script, with parameters and include the program to run Hand it over to the workload manager The workload manager will handle the job queue, monitor the progress and let you know the outcome. Inspect results

Interactive login (Qlogin)

qlogin > qlogin --account=xxx --time=00:10:00 salloc: Pending job allocation 12989428 salloc: job 12989428 queued and waiting for resources salloc: job 12989428 has been allocated resources salloc: Granted job allocation 12989428 srun: Job step created >source /cluster/bin/jobsetup Starting job 12990135 ("") on c15-6 at Wed Oct 28 18:14:31 CET 2015 >hostname

Running a job on the cluster - 2 SLURM ABEL Login to Abel from you laptop Create a job script, with parameters and include the program to run Hand it over to the workload manager The workload manager will handle the job queue, monitor the progress and let you know the outcome. Inspect results

Running a simple job Create a directory as follows mkdir $HOME/RCS_tutorial Find the project name projects Create job script Sample script from following location /cluster/teaching/abel_tutorial/nov2016/hello.slurm Set permission, so you can edit it chmod +w hell.slurm

Running a simple job Correct the project name in the hello.slurm file Submit the job to Abel sbatch hello.slurm

Job script #!/bin/bash #SBATCH --job-name=rcs0416_hello #SBATCH --account=xxx #SBATCH --time=00:02:00 #SBATCH --mem-per-cpu=256m #SBATCH --ntasks=1 source /cluster/bin/jobsetup set -o errexit echo "Hello from Abel" sleep 90 Resources Setup The job

> sbatch hello.slurm Join the queue Investigate while the job is running > squeue -u $USER > scontrol show job #### If you want to cancel the job (don't do this now) > scancel ####

Investigate >scontrol show job 12989353 JobId=12989353 Name=RCS1115_hello UserId=sabryr(243460) GroupId=users(100) Priority=22501 Nice=0 Account=staff QOS=staff JobState=COMPLETED Reason=None... RunTime=00:00:02 TimeLimit=00:01:00 Ti Command=../RCS_tutorial/hello.slurm WorkDir=../RCS_tutorial StdErr=../RCS_tutorial/slurm-12989552.out StdOut=../RCS_tutorial/slurm-12989552.out

Introduce an error #!/bin/bash #SBATCH --job-name=rcs1115_hello #SBATCH --account=xxx #SBATCH --time=00:02:00 #SBATCH --mem-per-cpu=512m source /cluster/bin/jobsetup set -o errexit xxecho "Hello from Abel" echo "Hello2 from Abel"

Investigate error file Debug

User scratch directory for faster IO

$SCRATCH #!/bin/bash #SBATCH --job-name=rcs0416_usescratch #SBATCH --account=xxx #SBATCH --time=00:00:15 #SBATCH --mem-per-cpu=1g #SBATCH --ntasks=1 source /cluster/bin/jobsetup set -o errexit echo $SUBMITDIR echo $SCRATCH cp $SUBMITDIR/input* $SCRATCH chkfile output.txt cat input* > output.txt

sbatch - memory #SBATCH --mem-per-cpu=<#g/m> Memory required per allocated core (format: 2G or 2048M) How much memory should one specify? The maximum usage of RAM by your program (plus some). Exaggerated values might, delay the job start. #SBATCH --partition=hugemem If you need more than 61GB of RAM on a single node (up to 1 TiB).

sbatch - time #SBATCH --time=hh:mm:ss Wall clock time limit on the job Some prior testing is necessary. One might, for example, test on smaller data sets and extrapolate. As with the memory, unnecessarily large values might delay the job start. This costs you (from allocated operation resources) Until a job is finished this will be reserved. #SBATCH --begin=hh:mm:ss Start the job at a given time (or later) #SBATCH --partition=long Maximum time for a job is 1 week (168 hours). If more needed, use hugemem or long partitions

sbatch CPUs and nodes Does your program support more than one CPU? If so, do they have to be on a single node? How many CPUs will the program run efficiently on? #SBATCH --nodes=nodes Number of nodes to allocate #SBATCH --ntasks-per-node=cores Number of cores to allocate within each allocated node #SBATCH --ntasks=cores Number of cores to allocate #SBATCH --cpus-per-task=cores Threads on one node

Supermicro X9DRT compute node Dual Intel E5-2670 (Sandy Bridge) based running at 2.6 GHz (2 sockets) 16 physical compute cores. Each node have 64 GiBytes of Samsung DDR3 memory

8 OR #SBATCH --ntasks=8 2 6 OR... 1 1 1 1 1 1 1 1

Calculate CPU hours KMEM=4580.2007628294 (/cluster/var/accounting/pe_factor) PE= NumCPUs if(minmemorycpu>kmem){ PE=PE*(MinMemoryCPU/KMEM) } PE_hours = $PE * TimeLimit / 3600

#SBATCH --nodes=1 #SBATCH --ntasks-per-node=8 8 X 1 = 8 *All tasks will share memory #SBATCH --ntasks=2 #SBATCH --cpus-per-task=8 8 X 2 = 16 #SBATCH --nodes=2 #SBATCH --ntasks-per-node=4 2 X 4 = 8

#SBATCH --nodes=1 #SBATCH --time=01:00:00 #SBATCH --ntasks-per-node=4 #SBATCH --mem-per-cpu=15g (4 X 1) +12 =16 *All memory occupied, KMEM=4580.2007628294 PE= 4 #(15 * 1024)>KMEM so PE=4 * ((15 * 1024)/KMEM) =13.41 PE_hours = 13.41 * (1 * 60 * 60) / 3600 =13.41 **Use the command cost to check account balance.

sbatch CPUs and nodes #SBATCH --ntasks=17 If you just need some cpus, no matter where: #SBATCH --nodes=8 --ntasks-per-node=4 If you need a specific number of cpus on each node #SBATCH --nodes=1 --ntasks-per-node=8 If you need the cpu's on a single node #SBATCH --nodes=1 --exclusive A node only for you

sbatch - constraint #SBATCH --constraint=feature Run job on nodes with a certain feature - ib, rackn. #SBATCH --constraint=ib Run job on nodes with infiniband(gigabit Ethernet) All nodes on Abel are equipped with InfiniBand (56 Gbits/s) Select if you run MPI jobs #SBATCH --constraint=ib&rack21 If you need more than one constraint in case of multiple specifications, the later overrides the earlier

sbatch - files #SBATCH --output=file Send 'stdout' (and stderr) to the specified file (instead of slurm-xxx.out) #SBATCH --error=file Send 'stderr' to the specified file #SBATCH --input=file Read 'stdin' from the specified file

sbatch low priority #SBATCH --qos=lowpri Run a job in the lowpri queue Even if all of your project's cpus are busy, you may utilize other cpus Such a job may be terminated and put back into the queue at any time. If possible, your job should ensure its state is saved regularly, and should be prepared to pick up on where it left off. Note: Notur projects cannot access lowpri.

Check Variables #!/bin/bash #SBATCH --job-name=rcs1115_hello #SBATCH --account=staff #SBATCH --time=00:01:05 #SBATCH --mem-per-cpu=512m #SBATCH --nodes=1 #SBATCH --ntasks-per-node=1 source /cluster/bin/jobsetup echo "SLURM_JOBID=" $SLURM_JOBID echo "SCRATCH=" $SCRATCH echo "SLURM_NPROCS=" $SLURM_NPROCS echo "SLURM_CPUS_ON_NODE=" $SLURM_CPUS_ON_NODE echo "SUBMITDIR= "$SUBMITDIR echo "TASK_ID="$TASK_ID

Don't Make sure that the same requested is not made again. Last instruction overrides previous(maybe). #SBATCH --nodes=4 #SBATCH --nodes=2 Request far more than you actually need (identify the bottlenecks). Try to make an inefficient program go faster by pumping more resources. Include any other instruction before all the SBATCH instructions are given.

Parreralizing example Task : 1. Read the input file 2. Change all A->T, T ->A, C ->G, G ->C 3. Write this to a new file Serial way of doing this: GACCTGGCTG GACCTGGCTG GACCTGGCTG GACCTGGCTG Read everything to memory Translate and write to a file CTGGACCGAC CTGGACCGAC CTGGACCGAC CTGGACCGAC

Parreralizing example 1.GACCTGGCTG 2.GACCTGGCTG 3.GACCTGGCTG 4.GACCTGGCTG read 1 and 2 read 3 and 4 Translate Translate Write to file 1 Write to file 2 Combine Read to 2 process at the same time Could be nearly twice as fast Requires less resources per process CTGGACCGAC CTGGACCGAC CTGGACCGAC CTGGACCGAC

What are the available software? On Abel issue the following command to list modules module avail module avail blast Currently loaded modules module list Clear modules module purge http://www.uio.no/english/services/it/research/hpc /abel/help/software/

Thank you. http://www.uio.no/english/services/it/research/hpc/abel/ hpc-drift@usit.uio.no