Submitting batch jobs Slurm on ecgate Solutions to the practicals

Similar documents
Submitting batch jobs

Submitting batch jobs Slurm on ecgate

CRUK cluster practical sessions (SLURM) Part I processes & scripts

Slurm at UPPMAX. How to submit jobs with our queueing system. Jessica Nettelblad sysadmin at UPPMAX

Bash for SLURM. Author: Wesley Schaal Pharmaceutical Bioinformatics, Uppsala University

Heterogeneous Job Support

HPC Introductory Course - Exercises

Introduction to High-Performance Computing (HPC)

Exercises: Abel/Colossus and SLURM

Submitting and running jobs on PlaFRIM2 Redouane Bouchouirbat

Working with Shell Scripting. Daniel Balagué

Introduction to SLURM & SLURM batch scripts

Using CSC Environment Efficiently,

LAB. Preparing for Stampede: Programming Heterogeneous Many-Core Supercomputers

Slurm and Abel job scripts. Katerina Michalickova The Research Computing Services Group SUF/USIT November 13, 2013

Slurm and Abel job scripts. Katerina Michalickova The Research Computing Services Group SUF/USIT October 23, 2012

Introduction to Joker Cyber Infrastructure Architecture Team CIA.NMSU.EDU

Introduction to SLURM & SLURM batch scripts

Introduction to SLURM on the High Performance Cluster at the Center for Computational Research

High Performance Computing Cluster Basic course

Using a Linux System 6

Slurm basics. Summer Kickstart June slide 1 of 49

Introduction to SLURM & SLURM batch scripts

Using CSC Environment Efficiently,

Introduction to GACRC Teaching Cluster

STARTING THE DDT DEBUGGER ON MIO, AUN, & MC2. (Mouse over to the left to see thumbnails of all of the slides)

For Dr Landau s PHYS8602 course

P a g e 1. HPC Example for C with OpenMPI

CSC BioWeek 2018: Using Taito cluster for high throughput data analysis

LAB. Preparing for Stampede: Programming Heterogeneous Many- Core Supercomputers

Batch Systems. Running calculations on HPC resources

CSC BioWeek 2016: Using Taito cluster for high throughput data analysis

bwunicluster Tutorial Access, Data Transfer, Compiling, Modulefiles, Batch Jobs

High Performance Computing Cluster Advanced course

How to run a job on a Cluster?

How to Use a Supercomputer - A Boot Camp

Introduction to GACRC Teaching Cluster PHYS8602

CNAG Advanced User Training

bwunicluster Tutorial Access, Data Transfer, Compiling, Modulefiles, Batch Jobs

Bright Cluster Manager: Using the NVIDIA NGC Deep Learning Containers

Introduction to High Performance Computing Using Sapelo2 at GACRC

OpenMP Exercises. These exercises will introduce you to using OpenMP for parallel programming. There are four exercises:

Introduction to UBELIX

Introduction to GACRC Teaching Cluster

Batch Systems. Running your jobs on an HPC machine

Sherlock for IBIIS. William Law Stanford Research Computing

June Workshop Series June 27th: All About SLURM University of Nebraska Lincoln Holland Computing Center. Carrie Brown, Adam Caprez

Slurm at UPPMAX. How to submit jobs with our queueing system. Jessica Nettelblad sysadmin at UPPMAX

High Performance Compu2ng Using Sapelo Cluster

Job Management on LONI and LSU HPC clusters

COSC 6385 Computer Architecture. - Homework

SCALABLE HYBRID PROTOTYPE

Using Compute Canada. Masao Fujinaga Information Services and Technology University of Alberta

Using Sapelo2 Cluster at the GACRC

COMP Superscalar. COMPSs at BSC. Supercomputers Manual

Batch Systems & Parallel Application Launchers Running your jobs on an HPC machine

Training day SLURM cluster. Context Infrastructure Environment Software usage Help section SLURM TP For further with SLURM Best practices Support TP

Graham vs legacy systems

Quick Start Guide. by Burak Himmetoglu. Supercomputing Consultant. Enterprise Technology Services & Center for Scientific Computing

OpenPBS Users Manual

Introduction to High-Performance Computing (HPC)

Introduction to RCC. September 14, 2016 Research Computing Center

Introduction to RCC. January 18, 2017 Research Computing Center

How to access Geyser and Caldera from Cheyenne. 19 December 2017 Consulting Services Group Brian Vanderwende

Open Multi-Processing: Basic Course

MIC Lab Parallel Computing on Stampede

An Introduction to Gauss. Paul D. Baines University of California, Davis November 20 th 2012

Assignment #4 due 10/21/04

INd_rasN SOME SHELL SCRIPTING PROGRAMS. 1. Write a shell script to check whether the name passed as first argument is the name of a file or directory.

Training day SLURM cluster. Context. Context renewal strategy

Introduction to the NCAR HPC Systems. 25 May 2018 Consulting Services Group Brian Vanderwende

Part One: The Files. C MPI Slurm Tutorial - Hello World. Introduction. Hello World! hello.tar. The files, summary. Output Files, summary

Sharpen Exercise: Using HPC resources and running parallel applications

Quick Start Guide. by Burak Himmetoglu. Supercomputing Consultant. Enterprise Technology Services & Center for Scientific Computing

Introduction to High Performance Computing at Case Western Reserve University. KSL Data Center

ECE 574 Cluster Computing Lecture 4

TITANI CLUSTER USER MANUAL V.1.3

SLURM: Resource Management and Job Scheduling Software. Advanced Computing Center for Research and Education

Parameter searches and the batch system

Using Cartesius and Lisa. Zheng Meyer-Zhao - Consultant Clustercomputing

To connect to the cluster, simply use a SSH or SFTP client to connect to:

Duke Compute Cluster Workshop. 10/04/2018 Tom Milledge rc.duke.edu

Choosing Resources Wisely Plamen Krastev Office: 38 Oxford, Room 117 FAS Research Computing

Using the Yale HPC Clusters

Andrej Filipčič

ECMWF Environment on the CRAY practical solutions

Choosing Resources Wisely. What is Research Computing?

Batch Usage on JURECA Introduction to Slurm. May 2016 Chrysovalantis Paschoulas HPS JSC

XSEDE New User Training. Ritu Arora November 14, 2014

Linux Tutorial. Ken-ichi Nomura. 3 rd Magics Materials Software Workshop. Gaithersburg Marriott Washingtonian Center November 11-13, 2018

Introduction to the shell Part II

Duke Compute Cluster Workshop. 3/28/2018 Tom Milledge rc.duke.edu

RHRK-Seminar. High Performance Computing with the Cluster Elwetritsch - II. Course instructor : Dr. Josef Schüle, RHRK

Slurm Birds of a Feather

Duke Compute Cluster Workshop. 11/10/2016 Tom Milledge h:ps://rc.duke.edu/

OpenSees on Teragrid

Getting Started with OSG Connect ~ an Interactive Tutorial ~

Introduction to HPC Using zcluster at GACRC

Kamiak Cheat Sheet. Display text file, one page at a time. Matches all files beginning with myfile See disk space on volume

eccodes GRIB Fortran 90 - Python APIs Practicals 2 Dominique Lucas and Xavi Abellan

Transcription:

Submitting batch jobs Slurm on ecgate Solutions to the practicals Xavi Abellan xavier.abellan@ecmwf.int User Support Section Com Intro 2015 Submitting batch jobs ECMWF 2015 Slide 1

Practical 1: Basic job submission Practicals must be run on ecgate, so make sure you log in there first! $> ssh ecgate $> cd $SCRATCH $> tar xvzf ~trx/intro/batch_ecgate_practicals.tar.gz $> cd batch_ecgate_practicals/basic 1. Have a look at the script env.sh 2. Submit the job and check whether it is running What QoS is it using? What is the time limit of the job? 3. Where did the output of the job go? Have a look at the output 4. Submit the job again and then once it starts cancel it 5. Check the output Com Intro 2015 Submitting batch jobs ECMWF 2015 Slide 2

Practical 1: Basic job submission $> cat env.sh #!/bin/bash # This is an example of a job dumping the SLURM related variables in the environment [ -z "$SLURM_JOBID" ] && echo "This job should be run in batch!" >&2 && exit 1 echo "Current time is `date`" echo "This should go to the error file..." >&2 echo "I am job id $SLURM_JOBID" echo "Going to sleep for a couple of minutes..." sleep 120 echo "Waking up!" echo "Current time is `date`" echo "Bye!" $>./env.sh This job should be run in batch! $> sbatch env.sh Submitted batch job 1269989 $> squeue -j 1269989 JOBID NAME USER QOS STATE TIME TIMELIMIT NODELIST(REASON) 1269989 env.sh trx normal RUNNING 0:54 1-00:00:00 ecgb04 $> ls env.sh slurm-1269989.out work Com Intro 2015 Submitting batch jobs ECMWF 2015 Slide 3

Practical 1: Basic job submission Can you modify the previous job so it 1. runs in the express QoS, with a wall clock limit of 5 minutes? 2. uses the subdirectory work/ as the working directory? 3. sends the a) output to the file work/env_out_<jobid>.out? b) error to work/env_out_<jobid>.err? 4. sends you an email when the job starts? Try your job after the modifications and check if they are correct You can do the modifications one by one or all at once Com Intro 2015 Submitting batch jobs ECMWF 2015 Slide 4

Practical 1: Basic job submission #!/bin/bash # This is an example of a job dumping the SLURM related variables in the environment #SBATCH --qos=express #SBATCH --time=5 #Replace trx by your user id #SBATCH --workdir=/scratch/ectrain/trx/batch_ecgate_practicals/basic/work #SBATCH --output=env_out_%j.out #SBATCH --error=env_out_%j.err #SBATCH --mail-type=begin [ -z "$SLURM_JOBID" ] && echo "This job should be run in batch!" >&2 && exit 1 echo "Current time is `date`" echo "This should go to the error file..." >&2 echo "I am job id $SLURM_JOBID" echo "Going to sleep for a couple of minutes..." sleep 120 echo "Waking up!" echo "Current time is `date`" echo "Bye! Com Intro 2015 Submitting batch jobs ECMWF 2015 Slide 5

Practical 2: reviewing past runs How would you... retrieve the list of jobs that you ran today? $> sacct retrieve the list of all the jobs that were cancelled today by user trx? $> sacct -u trx -s CANCELLED -S 00:00 ask for the submit, start and end times for a job of your choice? $> sacct j 1234567 -o jobid,submit,start,end find out the output an error paths for a job of your choice? $> sacct j 1234567 -o jobid,comment%190 Com Intro 2015 Submitting batch jobs ECMWF 2015 Slide 6

Practical 3: Fixing broken jobs What is wrong in job1? Can you fix it? No shebang specified at the beginning of the files Spaces between directive and value Job name must not contain any space #!/bin/bash #SBATCH --job-name=job_1 #SBATCH --output=job1-%j.out #SBATCH --error=job1-%j.out #SBATCH --qos=express #SBATCH --time=00:05:00 # This is the job echo "I was broken!" sleep 30 Com Intro 2015 Submitting batch jobs ECMWF 2015 Slide 7

Practical 3: Fixing broken jobs What is wrong in job2? Can you fix it? The qos name does not exist The time limit is wrongly set to 10 days instead of 10 hours #!/bin/bash #SBATCH --job-name=job2 #SBATCH --output=job2-%j.out #SBATCH --error=job2-%j.out #SBATCH --qos=normal #SBATCH --time=10:00 # This is the job echo "I was broken!" sleep 30 Com Intro 2015 Submitting batch jobs ECMWF 2015 Slide 8

Practical 3: Fixing broken jobs What is wrong in job3? Can you fix it? The output directory does not exist. You must create it before submitting Slurm will not expand shell variables in directives. You must replace $SCRATCH with the right value #!/bin/bash #SBATCH --job-name=job3 #SBATCH --output=output/job3-%j.out #SBATCH --error=output/job3-%j.out #SBATCH --workdir=$scratch/batch_ecgate_practicals/broken #SBATCH --qos=normal #SBATCH --time=00:01 # This is the job echo "I was broken!" sleep 30 Com Intro 2015 Submitting batch jobs ECMWF 2015 Slide 9

Bonus: Migrating from LoadLeveler $> ll2slurm -i ll_job.sh -o slurm_job.sh WARNING: ignoring directive 'environment'. The submitting environment is exported by default in Slurm. WARNING: ignoring directive 'job_cpu_limit'. Slurm uses wall clock instead of cpu time. Please use --time option. WARNING: No workdir (initialdir) set. Setting it to '/scratch/ectrain/trx' #!/bin/ksh #@ shell = /usr/bin/ksh #@ job_name = limits #@ output = $(job_name).$(jobid).out #@ error = $(job_name).$(jobid).out #@ environment = COPY_ALL #@ job_cpu_limit = 00:03:00,00:02:55 #@ wall_clock_limit= 00:05:00,00:04:50 #@ class=normal #@ queue #!/bin/ksh #SBATCH --workdir=/scratch/ectrain/trx #SBATCH --job-name=limits #SBATCH --output=limits.%j.out #SBATCH --error=limits.%j.out #SBATCH --time=00:05:00 #SBATCH --qos=normal echo "Hello World!" echo "Hello World!" Com Intro 2015 Submitting batch jobs ECMWF 2015 Slide 10