Using the SLURM Job Scheduler

Size: px
Start display at page:

Download "Using the SLURM Job Scheduler"

Transcription

1 Using the SLURM Job Scheduler [web] [ ] portal.biohpc.swmed.edu 1 Updated for

2 Overview Today we re going to cover: Part I: What is SLURM? How to use a basic set of SLURM commands? Part II: Learn how to write a sbatch script for job submission Demo Part III: Things need to know before running multi-threading, MPI and GPU jobs This is only an introduction, but it should provide you a good start. 2/32

3 Part I : What is SLURM Simple Linux Utility for Resource Management - Started as a simple resource manger for Linux clusters, about 500,000 lines of C code - Easy to use (e.g.: run a.out on a PC, run sbatch a.out on a cluster) - Fair-share resource allocations The glue for a parallel computer to execute parallel jobs - Make a parallel computer as almost easy to use as a PC - Typically use MPI to manage communications within the parallel program 3/32

4 4/32 Part I : Role of SLURM (resource management & job scheduling)

5 Part I : BioHPC Login via SSH to nucleus.biohpc.swmed.edu AND Submit job via SLURM /home2 /project /work storage systems Nucleus005 (Login Node) compute nodes 68 CPU nodes 8 GPU nodes You may also submit your job from workstation or thin-client Separate training session : July 14th 5/32

6 Part I: More about Login Node ( nucleus.biohpc.swmed.edu) Nucleus005 (Login Node) the Gateway of BioHPC cluster shared resource At Login Node You CAN : view/move/copy/edit files compile jobs submit jobs via SLURM check job status You CAN NOT: run long-term applications/jobs direct download large data 6/32

7 Part I: BioHPC Partitions (or Queue) Partition -- a collection of compute nodes BioHPC has 5 partitions GB : Nucleus GB : Nucleus GB : Nucleus super : Nucleus , Nucleus GPU : Nucleus Total 76 compute nodes 7/32

8 Part I : Life cycle of a job after submission Submit a Job Pending Configuration (node booting) Resizing Running Suspended Completing * BioHPC Policy : 16 running CPU nodes per user 2 running GPU nodes per user Cancelled (scancel) Completed (zero exit code) Failed (non-zero exit code) Time out (time limit reached) 8/32

9 Time Time Limit Part I : Set up time limit -- small jobs are easier to fit SLURM JOB Queue Job D Requested Nodes Job C Job E Job A Job B Number of Nodes Estimated compute time < User specified time limit < 2*Estimated compute time 9/32

10 Part I : SLURM commands SLURM commands - Before job submission : sinfo, squeue, sview, smap - Submit a job : sbatch, srun, sallocate, sattach - During job running : squeue, sview, scontrol - After job completed : sacct Man pages available for all commands - Help option prints brief descriptions of all options - Usage option prints a list of the options - Almost all options have two formats: A single letter option (e.g. -p super ) and A verbose option (e.g. --partition=super ) 10/32

11 Part I : sinfo (Reports status of nodes of partitions ) > sinfo (report status in node-oriented form) >sinfo p 256GB (report status of nodes in partition 256GB ) 11/32

12 Part I : use squeue, and scontrol to check job status > squeue > scontrol show job {jobid} 12/32

13 Part I : cancel job with scancle, and use sacct to check previous jobs > scancel {jobid} > sacct j {jobid} scontrol gives more detailed information of the job, but only for recent jobs; sacct keeps a completed history of job status, but only basic information 13/32

14 Part I : List of SLURM commands sbatch : Submit a script for later execution (batch mode) salloc : Create job allocation and start a shell to use it (interactive mode) srun : Create a job allocation (if needed) and launch a job step (typically an MPI job) sattach : Connect stdin/stdout/stderr for an existing job or job step squeue : Report job and job step status smap : Report system, job or step status with topology, less functionality than sview sview : Report/update system, job, step, partition or reservation status (GTK-based GUI) scontrol : Administrator tool to view/update system, job, step, partition or reservation status sacct : Report accounting information by individual job and job step 1432

15 Part I : List of valid job states PENDING (PD) : Job is awaiting resource allocation RUNNING (R) : Job is currently has an allocation SUSPENDED (S) : Job has an allocation, but execution has been suspended COMPLETING (CG) : Job is in the process of completing COMPLETED (CD) : Job has terminated all processes on all nodes COMFIGURING (CF) : Job has been allocated resources, waiting for them to be ready for use CANCELLED (CA) : Job was explicitly cancelled by the user or system administrator FAILED (F) : Job terminated with non-zero code or other failure condition TIMEOUT (TO) : job terminated upon reaching its time limit NODE_FAIL (NF) : Job terminated with non-zero exit code or other failure condition 15/32

16 Part II A Little Background of Filament Analysis Testing your code before submit to BioHPC cluster Job submission Demo 01: basic structure of sbatch script Demo 02: submit sequential jobs Demo 03 & 04 : submit parallel jobs Demo 05 & 06 : submit parallel jobs with srun Demo 07 : submit job with dependency 1632

17 Part II : Vimentin Filament Analysis Image Capture Plot Straightness 384-well plate DAPI FITC TRITC Network Analysis Straightness Intensity Length Steerable Filter Filament Segmentation 17/32

18 Part II : Testing your job before submission Testing your job On your own machine At your local workstation/thin-client Reserve a BioHPC compute node CPU job : remotegui or webgui GPU job : remotegpu or webgpu 18/32

19 Part II : Demo01 -- Basic structure of SLRUM script #!/bin/bash run SLURM script under bash shell #SBATCH --job-name=singlematlab #SBATCH --partition=super #SBATCH --nodes=1 #SBATCH --time=00-00:01:00 #SBATCH --output=single.%j.out #SBATCH --error=single.%j.err format: D-H:M:S set up SLURM environment module add matlab load software (export path & library) matlab nodisplay -nodesktop r forbiohpctestplot(1), exit Command (s) to be executed 19/32

20 Part II : More SBATCH options #SBATCH --begin=now+1hour Defer the allocation of the job until the specified time #SBATCH mail-type=all Notify user by when certain event types occur (BEGIN, END, FAIL, REQUEUE, etc.) #SBATCH mail-user=yi.du@utsouthwestern.edu Use to receive notification of state changes as defined by mail-type #SBATCH --mem= Specify the real memory required per node in MegaBytes. #SBATCH --nodelist=nucleus0[10-20] Request a specific list of node names. The order of the node names in the list is not important, the node names will be sorted by SLURM 20/32

21 Part II : Demo 02 & Demo submit multiple tasks to single node sequential tasks V.S. parallel tasks Filter Segmentation Analysis Plot 0 Time 0 Time #!/bin/bash #SBATCH --job-name=matlab #SBATCH --partition=super #SBATCH --nodes=1 #SBATCH --time=00-00:01:00 #SBATCH --output=single.%j.out #SBATCH --error=single.%j.err For both sequential and parallel tasks, SLURM environment and the software we needed are the same. The difference is from how your write your commands. module add matlab 21/32

22 Part II : Demo 02 & Demo submit multiple tasks to single node Demo 02: sequential tasks # Step 1: Steerable filter matlab nodisplay -nodesktop r MDFillter(1), exit # Step2: Filament Segmentation matlab nodisplay nodesktop r vimfilament(1), exit # Step3: Network Analysis matlab nodisplay nodesktop r MDAnalysis(1), exit # Step4: Plot Straightness matlab nodisplay nodesktop r forbiohpctestplot(1), exit Demo 03: parallel tasks # submit job to background matlab nodisplay -nodesktop r forbiohpctestplot(1), exit & matlab nodisplay -nodesktop r forbiohpctestplot(2), exit & matlab nodisplay -nodesktop r forbiohpctestplot(3), exit & # wait for background job to terminate, then returns wait 22/32

23 Part II : Demo version 2 of Demo 03 #!/bin/bash #SBATCH --job-name=multimatlab #SBATCH --partition=super #SBATCH --nodes=1 #SBATCH --time=00-00:01:00 #SBATCH --output=multi.%j.out #SBATCH --error=multi.%j.err module add matlab for i in `seq 1 16` do matlab nodisplay nodesktop r forbiohpctestplot($i), exit & done wait 23/32

24 Part II : How many tasks should I submitted to each node? Answer: For each node, socket : cores per socket : threads per core = 2 : 8 : 2, therefore, total available parallel tasks within a single node = 2*8*2 = 32 tasks/node Node Socket Core * 2 logical cores/threads inside each physical core Both sbatch and srun will create a resource relocation to run the job Moreover, srun allows user to specify on which node/core your job to be executed 24/32

25 Part II : Demo submit multiple jobs to single node with srun #!/bin/bash #SBATCH --job-name=srunsinglenodematlab #SBATCH --partition=super #SBATCH --nodes=1 #SBATCH ntasks=16 #SBATCH --time=00-00:01:00 #SBATCH --output=srunsinglenode.%j.out #SBATCH --error=srunsinglenode.%j.err module add matlab srun sh script.m Total number of tasks in the current job script.m SLURM_LOCALID : environment variable; Node local task ID for the process within a job. ( zero-based ) #!/bin/bash matlab nodisplay nodesktop r forbiohpctestplot($slurm_localid+1), exit 25/32

26 Part II : Demo submit multiple jobs to multi-node with srun #!/bin/bash #SBATCH --job-name=srun2nodematlab #SBATCH --partition=super #SBATCH --nodes=2 #SBATCH ntasks=16 #SBATCH --time=00-00:01:00 #SBATCH --output=srun2node.%j.out #SBATCH --error=srun2node.%j.err module add matlab SLURM_NODEID : the relative node ID of the current node (zero-based) SLURM_NNODES : Total number of nodes in the job s resource allocation SLURM_NTASKS : Total number of tasks in the current job srun sh script.m script.m #!/bin/bash SLURM_LOCALID : environment variable; Node local task ID for the process within a job. ( zero-based ) Let ID=$SLURM_NODEID*$SLURM_NTASKS/$SLURM_NNODES+$SLURM_LOCALID+1 echo process data $ID on `hostname`>>namelist.txt matlab nodisplay nodesktop r forbiohpctestplot($id), exit 26/32

27 Part II : Demo submit job with dependency >sbatch test_srun2nodes.sh >sbatch --depend={jobid} gen_gif.sh 27/32

28 Part III Multi-threading job on single node (shared memory) MPI job on multiple nodes (distributed memory) GPU job on single node 28/32

29 Part III : Demo submit multi-threading job to BioHPC #!/bin/bash #SBATCH --job-name=phenix #SBATCH --partition=super #SBATCH --nodes=1 #SBATCH ntasks=30 #SBATCH --time=0-20:00:00 #SBATCH --output=phenix.%j.out #SBATCH --error=phenix.%j.time module add phenix/1.9 phenix.den_refine model.pdb data.mtz nproc=30 Q: How big is your data? Choose the proper partition to fit in your data Q: What is the limit of number of threads from your software? Our cluster limit is 32 threads/node, choose whatever smaller 29/32

30 Part III : Demo submit a MPI job #!/bin/bash #SBATCH --job-name=mpi_relion #SBATCH --partition=super #SBATCH --nodes=2 #SBATCH ntasks=8 #SBATCH --time=0-80:00:00 #SBATCH --output=mpi_relion.%j.out Q: How big is your data? Choose partition and number of Nodes to fit in your data total tasks / number of Node <= 32 memory needed for each task * tasks on each node <= 128GB/256GB/384GB Q: What is the maximum speed-up you could achieve? module add relion/gcc/1.3 module add mvapich2/gcc/1.9 mpirun relion_refine_mpi --o Class3D_2classes/run5 --i all_particles.star --particle_diameter angpix ref run1_c1_sort_it025_class001.mrc --firstiter_cc --ini_high 30 --iter tau2_fudge 4 --flatten_solvent --zero_mask --ctf --ctf_corrected_ref --ctf_phase_flipped --sym C1 - -K 2 --oversampling 1 --healpix_order 3 --offset_range 5 --offset_step 2 --norm --scale --j 4 -- dont_combine_weights_via_disc --dont_centralize_image_reading --limit_tilt 30 --helix --htune -- htake hsuper 4 --hinner 0 --houter 80 --hrise hturn /32

31 Part III : Demo Submit a GPU job #!/bin/bash #SBATCH --job-name=cuda_test #SBATCH --partition=gpu #SBATCH --gres=gpu:1 #SBATCH --time=0-00:10:00 #SBATCH output=cuda.%j.out #SBATCH --error=cuda.%j.err module add cuda65 Jobs will not be allocated any generic resources unless specifically requested at job submit time. Using the gres option supported by sbatch and srun. Format: --gres=gpu:[n], where n is the number of GPUs Use GPU partition A(320, 10240) B(10240, 320)./matrixMul wa=320 ha=10240 wb=10240 hb=320 31/32

32 Getting Effective Help the ticket system: What is the problem? Provide any error message, and diagnostic output you have When did it happen? What time? Cluster or client? What job id? How did you run it? What did you run, what parameters, what do they mean? Any unusual circumstances? Have you compiled your own software? Do you customize startup scripts? Can we look at your scripts and data? Tell us if you are happy for us to access your scripts/data to help troubleshoot. 32/32

Introduction to BioHPC

Introduction to BioHPC Introduction to BioHPC New User Training [web] [email] portal.biohpc.swmed.edu biohpc-help@utsouthwestern.edu 1 Updated for 2015-06-03 Overview Today we re going to cover: What is BioHPC? How do I access

More information

Submitting and running jobs on PlaFRIM2 Redouane Bouchouirbat

Submitting and running jobs on PlaFRIM2 Redouane Bouchouirbat Submitting and running jobs on PlaFRIM2 Redouane Bouchouirbat Summary 1. Submitting Jobs: Batch mode - Interactive mode 2. Partition 3. Jobs: Serial, Parallel 4. Using generic resources Gres : GPUs, MICs.

More information

Introduction to SLURM on the High Performance Cluster at the Center for Computational Research

Introduction to SLURM on the High Performance Cluster at the Center for Computational Research Introduction to SLURM on the High Performance Cluster at the Center for Computational Research Cynthia Cornelius Center for Computational Research University at Buffalo, SUNY 701 Ellicott St Buffalo, NY

More information

Slurm and Abel job scripts. Katerina Michalickova The Research Computing Services Group SUF/USIT November 13, 2013

Slurm and Abel job scripts. Katerina Michalickova The Research Computing Services Group SUF/USIT November 13, 2013 Slurm and Abel job scripts Katerina Michalickova The Research Computing Services Group SUF/USIT November 13, 2013 Abel in numbers Nodes - 600+ Cores - 10000+ (1 node->2 processors->16 cores) Total memory

More information

Sherlock for IBIIS. William Law Stanford Research Computing

Sherlock for IBIIS. William Law Stanford Research Computing Sherlock for IBIIS William Law Stanford Research Computing Overview How we can help System overview Tech specs Signing on Batch submission Software environment Interactive jobs Next steps We are here to

More information

How to run a job on a Cluster?

How to run a job on a Cluster? How to run a job on a Cluster? Cluster Training Workshop Dr Samuel Kortas Computational Scientist KAUST Supercomputing Laboratory Samuel.kortas@kaust.edu.sa 17 October 2017 Outline 1. Resources available

More information

Batch Usage on JURECA Introduction to Slurm. May 2016 Chrysovalantis Paschoulas HPS JSC

Batch Usage on JURECA Introduction to Slurm. May 2016 Chrysovalantis Paschoulas HPS JSC Batch Usage on JURECA Introduction to Slurm May 2016 Chrysovalantis Paschoulas HPS group @ JSC Batch System Concepts Resource Manager is the software responsible for managing the resources of a cluster,

More information

Slurm basics. Summer Kickstart June slide 1 of 49

Slurm basics. Summer Kickstart June slide 1 of 49 Slurm basics Summer Kickstart 2017 June 2017 slide 1 of 49 Triton layers Triton is a powerful but complex machine. You have to consider: Connecting (ssh) Data storage (filesystems and Lustre) Resource

More information

High Performance Computing Cluster Advanced course

High Performance Computing Cluster Advanced course High Performance Computing Cluster Advanced course Jeremie Vandenplas, Gwen Dawes 9 November 2017 Outline Introduction to the Agrogenomics HPC Submitting and monitoring jobs on the HPC Parallel jobs on

More information

Introduction to High-Performance Computing (HPC)

Introduction to High-Performance Computing (HPC) Introduction to High-Performance Computing (HPC) Computer components CPU : Central Processing Unit cores : individual processing units within a CPU Storage : Disk drives HDD : Hard Disk Drive SSD : Solid

More information

Duke Compute Cluster Workshop. 3/28/2018 Tom Milledge rc.duke.edu

Duke Compute Cluster Workshop. 3/28/2018 Tom Milledge rc.duke.edu Duke Compute Cluster Workshop 3/28/2018 Tom Milledge rc.duke.edu rescomputing@duke.edu Outline of talk Overview of Research Computing resources Duke Compute Cluster overview Running interactive and batch

More information

Resource Management at LLNL SLURM Version 1.2

Resource Management at LLNL SLURM Version 1.2 UCRL PRES 230170 Resource Management at LLNL SLURM Version 1.2 April 2007 Morris Jette (jette1@llnl.gov) Danny Auble (auble1@llnl.gov) Chris Morrone (morrone2@llnl.gov) Lawrence Livermore National Laboratory

More information

Introduction to SLURM & SLURM batch scripts

Introduction to SLURM & SLURM batch scripts Introduction to SLURM & SLURM batch scripts Anita Orendt Assistant Director Research Consulting & Faculty Engagement anita.orendt@utah.edu 16 Feb 2017 Overview of Talk Basic SLURM commands SLURM batch

More information

Using a Linux System 6

Using a Linux System 6 Canaan User Guide Connecting to the Cluster 1 SSH (Secure Shell) 1 Starting an ssh session from a Mac or Linux system 1 Starting an ssh session from a Windows PC 1 Once you're connected... 1 Ending an

More information

Duke Compute Cluster Workshop. 10/04/2018 Tom Milledge rc.duke.edu

Duke Compute Cluster Workshop. 10/04/2018 Tom Milledge rc.duke.edu Duke Compute Cluster Workshop 10/04/2018 Tom Milledge rc.duke.edu rescomputing@duke.edu Outline of talk Overview of Research Computing resources Duke Compute Cluster overview Running interactive and batch

More information

Introduction to BioHPC

Introduction to BioHPC Introduction to BioHPC New User Training [web] [email] portal.biohpc.swmed.edu biohpc-help@utsouthwestern.edu 1 Updated for 2018-03-07 Overview Today we re going to cover: What is BioHPC? How do I access

More information

Heterogeneous Job Support

Heterogeneous Job Support Heterogeneous Job Support Tim Wickberg SchedMD SC17 Submitting Jobs Multiple independent job specifications identified in command line using : separator The job specifications are sent to slurmctld daemon

More information

Submitting batch jobs

Submitting batch jobs Submitting batch jobs SLURM on ECGATE Xavi Abellan Xavier.Abellan@ecmwf.int ECMWF February 20, 2017 Outline Interactive mode versus Batch mode Overview of the Slurm batch system on ecgate Batch basic concepts

More information

High Performance Computing Cluster Basic course

High Performance Computing Cluster Basic course High Performance Computing Cluster Basic course Jeremie Vandenplas, Gwen Dawes 30 October 2017 Outline Introduction to the Agrogenomics HPC Connecting with Secure Shell to the HPC Introduction to the Unix/Linux

More information

RHRK-Seminar. High Performance Computing with the Cluster Elwetritsch - II. Course instructor : Dr. Josef Schüle, RHRK

RHRK-Seminar. High Performance Computing with the Cluster Elwetritsch - II. Course instructor : Dr. Josef Schüle, RHRK RHRK-Seminar High Performance Computing with the Cluster Elwetritsch - II Course instructor : Dr. Josef Schüle, RHRK Overview Course I Login to cluster SSH RDP / NX Desktop Environments GNOME (default)

More information

Introduction to SLURM & SLURM batch scripts

Introduction to SLURM & SLURM batch scripts Introduction to SLURM & SLURM batch scripts Anita Orendt Assistant Director Research Consulting & Faculty Engagement anita.orendt@utah.edu 23 June 2016 Overview of Talk Basic SLURM commands SLURM batch

More information

Introduction to SLURM & SLURM batch scripts

Introduction to SLURM & SLURM batch scripts Introduction to SLURM & SLURM batch scripts Anita Orendt Assistant Director Research Consulting & Faculty Engagement anita.orendt@utah.edu 6 February 2018 Overview of Talk Basic SLURM commands SLURM batch

More information

Introduction to Joker Cyber Infrastructure Architecture Team CIA.NMSU.EDU

Introduction to Joker Cyber Infrastructure Architecture Team CIA.NMSU.EDU Introduction to Joker Cyber Infrastructure Architecture Team CIA.NMSU.EDU What is Joker? NMSU s supercomputer. 238 core computer cluster. Intel E-5 Xeon CPUs and Nvidia K-40 GPUs. InfiniBand innerconnect.

More information

Workstations & Thin Clients

Workstations & Thin Clients 1 Workstations & Thin Clients Overview Why use a BioHPC computer? System Specs Network requirements OS Tour Running Code Locally Submitting Jobs to the Cluster Run Graphical Jobs on the Cluster Use Windows

More information

Parallel Programming in MATLAB on BioHPC

Parallel Programming in MATLAB on BioHPC Parallel Programming in MATLAB on BioHPC [web] [email] portal.biohpc.swmed.edu biohpc-help@utsouthwestern.edu 1 Updated for 2017-05-17 What is MATLAB High level language and development environment for:

More information

Graham vs legacy systems

Graham vs legacy systems New User Seminar Graham vs legacy systems This webinar only covers topics pertaining to graham. For the introduction to our legacy systems (Orca etc.), please check the following recorded webinar: SHARCNet

More information

Introduction to BioHPC

Introduction to BioHPC Introduction to BioHPC New User Training [web] [email] portal.biohpc.swmed.edu biohpc-help@utsouthwestern.edu 1 Updated for 2017-01-04 Overview Today we re going to cover: What is BioHPC? How do I access

More information

Duke Compute Cluster Workshop. 11/10/2016 Tom Milledge h:ps://rc.duke.edu/

Duke Compute Cluster Workshop. 11/10/2016 Tom Milledge h:ps://rc.duke.edu/ Duke Compute Cluster Workshop 11/10/2016 Tom Milledge h:ps://rc.duke.edu/ rescompu>ng@duke.edu Outline of talk Overview of Research Compu>ng resources Duke Compute Cluster overview Running interac>ve and

More information

Introduction to BioHPC New User Training

Introduction to BioHPC New User Training Introduction to BioHPC New User Training [web] [email] portal.biohpc.swmed.edu biohpc-help@utsouthwestern.edu 1 Updated for 2019-02-06 Overview Today we re going to cover: What is BioHPC? How do I access

More information

Introduction to RCC. September 14, 2016 Research Computing Center

Introduction to RCC. September 14, 2016 Research Computing Center Introduction to HPC @ RCC September 14, 2016 Research Computing Center What is HPC High Performance Computing most generally refers to the practice of aggregating computing power in a way that delivers

More information

Introduction to RCC. January 18, 2017 Research Computing Center

Introduction to RCC. January 18, 2017 Research Computing Center Introduction to HPC @ RCC January 18, 2017 Research Computing Center What is HPC High Performance Computing most generally refers to the practice of aggregating computing power in a way that delivers much

More information

Scheduling By Trackable Resources

Scheduling By Trackable Resources Scheduling By Trackable Resources Morris Jette and Dominik Bartkiewicz SchedMD Slurm User Group Meeting 2018 Thanks to NVIDIA for sponsoring this work Goals More flexible scheduling mechanism Especially

More information

Slurm and Abel job scripts. Katerina Michalickova The Research Computing Services Group SUF/USIT October 23, 2012

Slurm and Abel job scripts. Katerina Michalickova The Research Computing Services Group SUF/USIT October 23, 2012 Slurm and Abel job scripts Katerina Michalickova The Research Computing Services Group SUF/USIT October 23, 2012 Abel in numbers Nodes - 600+ Cores - 10000+ (1 node->2 processors->16 cores) Total memory

More information

Introduction to Slurm

Introduction to Slurm Introduction to Slurm Tim Wickberg SchedMD Slurm User Group Meeting 2017 Outline Roles of resource manager and job scheduler Slurm description and design goals Slurm architecture and plugins Slurm configuration

More information

SLURM: Resource Management and Job Scheduling Software. Advanced Computing Center for Research and Education

SLURM: Resource Management and Job Scheduling Software. Advanced Computing Center for Research and Education SLURM: Resource Management and Job Scheduling Software Advanced Computing Center for Research and Education www.accre.vanderbilt.edu Simple Linux Utility for Resource Management But it s also a job scheduler!

More information

How to access Geyser and Caldera from Cheyenne. 19 December 2017 Consulting Services Group Brian Vanderwende

How to access Geyser and Caldera from Cheyenne. 19 December 2017 Consulting Services Group Brian Vanderwende How to access Geyser and Caldera from Cheyenne 19 December 2017 Consulting Services Group Brian Vanderwende Geyser nodes useful for large-scale data analysis and post-processing tasks 16 nodes with: 40

More information

June Workshop Series June 27th: All About SLURM University of Nebraska Lincoln Holland Computing Center. Carrie Brown, Adam Caprez

June Workshop Series June 27th: All About SLURM University of Nebraska Lincoln Holland Computing Center. Carrie Brown, Adam Caprez June Workshop Series June 27th: All About SLURM University of Nebraska Lincoln Holland Computing Center Carrie Brown, Adam Caprez Setup Instructions Please complete these steps before the lessons start

More information

Batch Systems & Parallel Application Launchers Running your jobs on an HPC machine

Batch Systems & Parallel Application Launchers Running your jobs on an HPC machine Batch Systems & Parallel Application Launchers Running your jobs on an HPC machine Partners Funding Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike

More information

Slurm Workload Manager Introductory User Training

Slurm Workload Manager Introductory User Training Slurm Workload Manager Introductory User Training David Bigagli david@schedmd.com SchedMD LLC Outline Roles of resource manager and job scheduler Slurm design and architecture Submitting and running jobs

More information

Slurm Overview. Brian Christiansen, Marshall Garey, Isaac Hartung SchedMD SC17. Copyright 2017 SchedMD LLC

Slurm Overview. Brian Christiansen, Marshall Garey, Isaac Hartung SchedMD SC17. Copyright 2017 SchedMD LLC Slurm Overview Brian Christiansen, Marshall Garey, Isaac Hartung SchedMD SC17 Outline Roles of a resource manager and job scheduler Slurm description and design goals Slurm architecture and plugins Slurm

More information

Introduction to BioHPC New User Training

Introduction to BioHPC New User Training Introduction to BioHPC New User Training [web] [email] portal.biohpc.swmed.edu biohpc-help@utsouthwestern.edu 1 Updated for 2018-04-04 Overview Today we re going to cover: What is BioHPC? How do I access

More information

Exercises: Abel/Colossus and SLURM

Exercises: Abel/Colossus and SLURM Exercises: Abel/Colossus and SLURM November 08, 2016 Sabry Razick The Research Computing Services Group, USIT Topics Get access Running a simple job Job script Running a simple job -- qlogin Customize

More information

Introduction to UBELIX

Introduction to UBELIX Science IT Support (ScITS) Michael Rolli, Nico Färber Informatikdienste Universität Bern 06.06.2017, Introduction to UBELIX Agenda > Introduction to UBELIX (Overview only) Other topics spread in > Introducing

More information

Batch Systems. Running your jobs on an HPC machine

Batch Systems. Running your jobs on an HPC machine Batch Systems Running your jobs on an HPC machine Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us

More information

Working with Shell Scripting. Daniel Balagué

Working with Shell Scripting. Daniel Balagué Working with Shell Scripting Daniel Balagué Editing Text Files We offer many text editors in the HPC cluster. Command-Line Interface (CLI) editors: vi / vim nano (very intuitive and easy to use if you

More information

Introduction to High-Performance Computing (HPC)

Introduction to High-Performance Computing (HPC) Introduction to High-Performance Computing (HPC) Computer components CPU : Central Processing Unit cores : individual processing units within a CPU Storage : Disk drives HDD : Hard Disk Drive SSD : Solid

More information

CRUK cluster practical sessions (SLURM) Part I processes & scripts

CRUK cluster practical sessions (SLURM) Part I processes & scripts CRUK cluster practical sessions (SLURM) Part I processes & scripts login Log in to the head node, clust1-headnode, using ssh and your usual user name & password. SSH Secure Shell 3.2.9 (Build 283) Copyright

More information

CNAG Advanced User Training

CNAG Advanced User Training www.bsc.es CNAG Advanced User Training Aníbal Moreno, CNAG System Administrator Pablo Ródenas, BSC HPC Support Rubén Ramos Horta, CNAG HPC Support Barcelona,May the 5th Aim Understand CNAG s cluster design

More information

Parallel Merge Sort Using MPI

Parallel Merge Sort Using MPI Parallel Merge Sort Using MPI CSE 702: Seminar on Programming Massively Parallel Systems Course Instructor: Dr. Russ Miller UB Distinguished Professor Department of Computer Science & Engineering State

More information

SCALABLE HYBRID PROTOTYPE

SCALABLE HYBRID PROTOTYPE SCALABLE HYBRID PROTOTYPE Scalable Hybrid Prototype Part of the PRACE Technology Evaluation Objectives Enabling key applications on new architectures Familiarizing users and providing a research platform

More information

Bash for SLURM. Author: Wesley Schaal Pharmaceutical Bioinformatics, Uppsala University

Bash for SLURM. Author: Wesley Schaal Pharmaceutical Bioinformatics, Uppsala University Bash for SLURM Author: Wesley Schaal Pharmaceutical Bioinformatics, Uppsala University wesley.schaal@farmbio.uu.se Lab session: Pavlin Mitev (pavlin.mitev@kemi.uu.se) it i slides at http://uppmax.uu.se/support/courses

More information

Versions and 14.11

Versions and 14.11 Slurm Update Versions 14.03 and 14.11 Jacob Jenson jacob@schedmd.com Yiannis Georgiou yiannis.georgiou@bull.net V14.03 - Highlights Support for native Slurm operation on Cray systems (without ALPS) Run

More information

Copyright 2017 Intel Parallel Computing Center for Structural Biology at Dana- Farber Cancer Institute, in association with Peking University.

Copyright 2017 Intel Parallel Computing Center for Structural Biology at Dana- Farber Cancer Institute, in association with Peking University. ROME 1.1.0 Tutorial Copyright 2017 Intel Parallel Computing Center for Structural Biology at Dana- Farber Cancer Institute, in association with Peking University. 1 Getting started 1.1 Recommended reading

More information

Welcome to the Introduc/on to BioHPC training session. My name is David Trudgian, and I m one of the Computa/onal Scien/sts in the BioHPC team.

Welcome to the Introduc/on to BioHPC training session. My name is David Trudgian, and I m one of the Computa/onal Scien/sts in the BioHPC team. Welcome to the Introduc/on to BioHPC training session. My name is David Trudgian, and I m one of the Computa/onal Scien/sts in the BioHPC team. We ll introduce the other staff in a minute. This is the

More information

Introduction to GACRC Teaching Cluster PHYS8602

Introduction to GACRC Teaching Cluster PHYS8602 Introduction to GACRC Teaching Cluster PHYS8602 Georgia Advanced Computing Resource Center (GACRC) EITS/University of Georgia Zhuofei Hou zhuofei@uga.edu 1 Outline GACRC Overview Computing Resources Three

More information

Choosing Resources Wisely Plamen Krastev Office: 38 Oxford, Room 117 FAS Research Computing

Choosing Resources Wisely Plamen Krastev Office: 38 Oxford, Room 117 FAS Research Computing Choosing Resources Wisely Plamen Krastev Office: 38 Oxford, Room 117 Email:plamenkrastev@fas.harvard.edu Objectives Inform you of available computational resources Help you choose appropriate computational

More information

Training day SLURM cluster. Context Infrastructure Environment Software usage Help section SLURM TP For further with SLURM Best practices Support TP

Training day SLURM cluster. Context Infrastructure Environment Software usage Help section SLURM TP For further with SLURM Best practices Support TP Training day SLURM cluster Context Infrastructure Environment Software usage Help section SLURM TP For further with SLURM Best practices Support TP Context PRE-REQUISITE : LINUX connect to «genologin»

More information

Slurm at UPPMAX. How to submit jobs with our queueing system. Jessica Nettelblad sysadmin at UPPMAX

Slurm at UPPMAX. How to submit jobs with our queueing system. Jessica Nettelblad sysadmin at UPPMAX Slurm at UPPMAX How to submit jobs with our queueing system Jessica Nettelblad sysadmin at UPPMAX Slurm at UPPMAX Intro Queueing with Slurm How to submit jobs Testing How to test your scripts before submission

More information

Introduction to GACRC Teaching Cluster

Introduction to GACRC Teaching Cluster Introduction to GACRC Teaching Cluster Georgia Advanced Computing Resource Center (GACRC) EITS/University of Georgia Zhuofei Hou zhuofei@uga.edu 1 Outline GACRC Overview Computing Resources Three Folders

More information

Introduction to GACRC Teaching Cluster

Introduction to GACRC Teaching Cluster Introduction to GACRC Teaching Cluster Georgia Advanced Computing Resource Center (GACRC) EITS/University of Georgia Zhuofei Hou zhuofei@uga.edu 1 Outline GACRC Overview Computing Resources Three Folders

More information

SLURM Operation on Cray XT and XE

SLURM Operation on Cray XT and XE SLURM Operation on Cray XT and XE Morris Jette jette@schedmd.com Contributors and Collaborators This work was supported by the Oak Ridge National Laboratory Extreme Scale Systems Center. Swiss National

More information

Scientific Computing in practice

Scientific Computing in practice Scientific Computing in practice Kickstart 2015 (cont.) Ivan Degtyarenko, Janne Blomqvist, Mikko Hakala, Simo Tuomisto School of Science, Aalto University June 1, 2015 slide 1 of 62 Triton practicalities

More information

Part One: The Files. C MPI Slurm Tutorial - Hello World. Introduction. Hello World! hello.tar. The files, summary. Output Files, summary

Part One: The Files. C MPI Slurm Tutorial - Hello World. Introduction. Hello World! hello.tar. The files, summary. Output Files, summary C MPI Slurm Tutorial - Hello World Introduction The example shown here demonstrates the use of the Slurm Scheduler for the purpose of running a C/MPI program. Knowledge of C is assumed. Having read the

More information

Training day SLURM cluster. Context. Context renewal strategy

Training day SLURM cluster. Context. Context renewal strategy Training day cluster Context Infrastructure Environment Software usage Help section For further with Best practices Support Context PRE-REQUISITE : LINUX connect to «genologin» server Basic command line

More information

1 Bull, 2011 Bull Extreme Computing

1 Bull, 2011 Bull Extreme Computing 1 Bull, 2011 Bull Extreme Computing Table of Contents Overview. Principal concepts. Architecture. Scheduler Policies. 2 Bull, 2011 Bull Extreme Computing SLURM Overview Ares, Gerardo, HPC Team Introduction

More information

Using Compute Canada. Masao Fujinaga Information Services and Technology University of Alberta

Using Compute Canada. Masao Fujinaga Information Services and Technology University of Alberta Using Compute Canada Masao Fujinaga Information Services and Technology University of Alberta Introduction to cedar batch system jobs are queued priority depends on allocation and past usage Cedar Nodes

More information

Introduction to High Performance Computing at Case Western Reserve University. KSL Data Center

Introduction to High Performance Computing at Case Western Reserve University. KSL Data Center Introduction to High Performance Computing at Case Western Reserve University Research Computing and CyberInfrastructure team KSL Data Center Presenters Emily Dragowsky Daniel Balagué Guardia Hadrian Djohari

More information

Submitting batch jobs Slurm on ecgate

Submitting batch jobs Slurm on ecgate Submitting batch jobs Slurm on ecgate Xavi Abellan xavier.abellan@ecmwf.int User Support Section Com Intro 2015 Submitting batch jobs ECMWF 2015 Slide 1 Outline Interactive mode versus Batch mode Overview

More information

Choosing Resources Wisely. What is Research Computing?

Choosing Resources Wisely. What is Research Computing? Choosing Resources Wisely Scott Yockel, PhD Harvard - Research Computing What is Research Computing? Faculty of Arts and Sciences (FAS) department that handles nonenterprise IT requests from researchers.

More information

Introduction to the NCAR HPC Systems. 25 May 2018 Consulting Services Group Brian Vanderwende

Introduction to the NCAR HPC Systems. 25 May 2018 Consulting Services Group Brian Vanderwende Introduction to the NCAR HPC Systems 25 May 2018 Consulting Services Group Brian Vanderwende Topics to cover Overview of the NCAR cluster resources Basic tasks in the HPC environment Accessing pre-built

More information

ECE 574 Cluster Computing Lecture 4

ECE 574 Cluster Computing Lecture 4 ECE 574 Cluster Computing Lecture 4 Vince Weaver http://web.eece.maine.edu/~vweaver vincent.weaver@maine.edu 31 January 2017 Announcements Don t forget about homework #3 I ran HPCG benchmark on Haswell-EP

More information

Submitting batch jobs Slurm on ecgate Solutions to the practicals

Submitting batch jobs Slurm on ecgate Solutions to the practicals Submitting batch jobs Slurm on ecgate Solutions to the practicals Xavi Abellan xavier.abellan@ecmwf.int User Support Section Com Intro 2015 Submitting batch jobs ECMWF 2015 Slide 1 Practical 1: Basic job

More information

INTRODUCTION TO GPU COMPUTING WITH CUDA. Topi Siro

INTRODUCTION TO GPU COMPUTING WITH CUDA. Topi Siro INTRODUCTION TO GPU COMPUTING WITH CUDA Topi Siro 19.10.2015 OUTLINE PART I - Tue 20.10 10-12 What is GPU computing? What is CUDA? Running GPU jobs on Triton PART II - Thu 22.10 10-12 Using libraries Different

More information

HPC Introductory Course - Exercises

HPC Introductory Course - Exercises HPC Introductory Course - Exercises The exercises in the following sections will guide you understand and become more familiar with how to use the Balena HPC service. Lines which start with $ are commands

More information

For Dr Landau s PHYS8602 course

For Dr Landau s PHYS8602 course For Dr Landau s PHYS8602 course Shan-Ho Tsai (shtsai@uga.edu) Georgia Advanced Computing Resource Center - GACRC January 7, 2019 You will be given a student account on the GACRC s Teaching cluster. Your

More information

Cluster Computing in Frankfurt

Cluster Computing in Frankfurt Cluster Computing in Frankfurt Goethe University in Frankfurt/Main Center for Scientific Computing December 12, 2017 Center for Scientific Computing What can we provide you? CSC Center for Scientific Computing

More information

Student HPC Hackathon 8/2018

Student HPC Hackathon 8/2018 Student HPC Hackathon 8/2018 J. Simon, C. Plessl 22. + 23. August 2018 J. Simon - Architecture of Parallel Computer Systems SoSe 2018 < 1 > Student HPC Hackathon 8/2018 Get the most performance out of

More information

COSC 6374 Parallel Computation. Debugging MPI applications. Edgar Gabriel. Spring 2008

COSC 6374 Parallel Computation. Debugging MPI applications. Edgar Gabriel. Spring 2008 COSC 6374 Parallel Computation Debugging MPI applications Spring 2008 How to use a cluster A cluster usually consists of a front-end node and compute nodes Name of the front-end node: shark.cs.uh.edu You

More information

Federated Cluster Support

Federated Cluster Support Federated Cluster Support Brian Christiansen and Morris Jette SchedMD LLC Slurm User Group Meeting 2015 Background Slurm has long had limited support for federated clusters Most commands support a --cluster

More information

TITANI CLUSTER USER MANUAL V.1.3

TITANI CLUSTER USER MANUAL V.1.3 2016 TITANI CLUSTER USER MANUAL V.1.3 This document is intended to give some basic notes in order to work with the TITANI High Performance Green Computing Cluster of the Civil Engineering School (ETSECCPB)

More information

Compiling applications for the Cray XC

Compiling applications for the Cray XC Compiling applications for the Cray XC Compiler Driver Wrappers (1) All applications that will run in parallel on the Cray XC should be compiled with the standard language wrappers. The compiler drivers

More information

Kimmo Mattila Ari-Matti Sarén. CSC Bioweek Computing intensive bioinformatics analysis on Taito

Kimmo Mattila Ari-Matti Sarén. CSC Bioweek Computing intensive bioinformatics analysis on Taito Kimmo Mattila Ari-Matti Sarén CSC Bioweek 2018 Computing intensive bioinformatics analysis on Taito 7. 2. 2018 CSC Environment Sisu Cray XC40 Massively Parallel Processor (MPP) supercomputer 3376 12-core

More information

Slurm Birds of a Feather

Slurm Birds of a Feather Slurm Birds of a Feather Tim Wickberg SchedMD SC17 Outline Welcome Roadmap Review of 17.02 release (Februrary 2017) Overview of upcoming 17.11 (November 2017) release Roadmap for 18.08 and beyond Time

More information

Introduction to the Cluster

Introduction to the Cluster Follow us on Twitter for important news and updates: @ACCREVandy Introduction to the Cluster Advanced Computing Center for Research and Education http://www.accre.vanderbilt.edu The Cluster We will be

More information

Programming Environment

Programming Environment Programming Environment Cornell Center for Advanced Computing June 11, 2013 Thanks to Dan Stanzione, Bill Barth, Lars Koesterke, Kent Milfeld, Doug James, and Robert McLay for their materials developed

More information

P a g e 1. HPC Example for C with OpenMPI

P a g e 1. HPC Example for C with OpenMPI P a g e 1 HPC Example for C with OpenMPI Revision History Version Date Prepared By Summary of Changes 1.0 Jul 3, 2017 Raymond Tsang Initial release 1.1 Jul 24, 2018 Ray Cheung Minor change HPC Example

More information

Kamiak Cheat Sheet. Display text file, one page at a time. Matches all files beginning with myfile See disk space on volume

Kamiak Cheat Sheet. Display text file, one page at a time. Matches all files beginning with myfile See disk space on volume Kamiak Cheat Sheet Logging in to Kamiak ssh your.name@kamiak.wsu.edu ssh -X your.name@kamiak.wsu.edu X11 forwarding Transferring Files to and from Kamiak scp -r myfile your.name@kamiak.wsu.edu:~ Copy to

More information

Parallel Programming in MATLAB on BioHPC

Parallel Programming in MATLAB on BioHPC Parallel Programming in MATLAB on BioHPC [web] [email] portal.biohpc.swmed.edu biohpc-help@utsouthwestern.edu 1 Updated for 2018-03-21 What is MATLAB High level language and development environment for:

More information

Slurm Version Overview

Slurm Version Overview Slurm Version 18.08 Overview Brian Christiansen SchedMD Slurm User Group Meeting 2018 Schedule Previous major release was 17.11 (November 2017) Latest major release 18.08 (August 2018) Next major release

More information

LAB. Preparing for Stampede: Programming Heterogeneous Many-Core Supercomputers

LAB. Preparing for Stampede: Programming Heterogeneous Many-Core Supercomputers LAB Preparing for Stampede: Programming Heterogeneous Many-Core Supercomputers Dan Stanzione, Lars Koesterke, Bill Barth, Kent Milfeld dan/lars/bbarth/milfeld@tacc.utexas.edu XSEDE 12 July 16, 2012 1 Discovery

More information

Slurm Burst Buffer Support

Slurm Burst Buffer Support Slurm Burst Buffer Support Tim Wickberg (SchedMD LLC) SC15 Burst Buffer Overview A cluster-wide high-performance storage resource Burst buffer (BB) support added Slurm version 15.08 Two types of BB allocations:

More information

UPPMAX Introduction Martin Dahlö Valentin Georgiev

UPPMAX Introduction Martin Dahlö Valentin Georgiev UPPMAX Introduction 2017-11-27 Martin Dahlö martin.dahlo@scilifelab.uu.se Valentin Georgiev valentin.georgiev@icm.uu.se Objectives What is UPPMAX what it provides Projects at UPPMAX How to access UPPMAX

More information

Introduction to HPC2N

Introduction to HPC2N Introduction to HPC2N Birgitte Brydsø HPC2N, Umeå University 4 May 2017 1 / 24 Overview Kebnekaise and Abisko Using our systems The File System The Module System Overview Compiler Tool Chains Examples

More information

STARTING THE DDT DEBUGGER ON MIO, AUN, & MC2. (Mouse over to the left to see thumbnails of all of the slides)

STARTING THE DDT DEBUGGER ON MIO, AUN, & MC2. (Mouse over to the left to see thumbnails of all of the slides) STARTING THE DDT DEBUGGER ON MIO, AUN, & MC2 (Mouse over to the left to see thumbnails of all of the slides) ALLINEA DDT Allinea DDT is a powerful, easy-to-use graphical debugger capable of debugging a

More information

Monitoring and Trouble Shooting on BioHPC

Monitoring and Trouble Shooting on BioHPC Monitoring and Trouble Shooting on BioHPC [web] [email] portal.biohpc.swmed.edu biohpc-help@utsouthwestern.edu 1 Updated for 2017-03-15 Why Monitoring & Troubleshooting data code Monitoring jobs running

More information

Before We Start. Sign in hpcxx account slips Windows Users: Download PuTTY. Google PuTTY First result Save putty.exe to Desktop

Before We Start. Sign in hpcxx account slips Windows Users: Download PuTTY. Google PuTTY First result Save putty.exe to Desktop Before We Start Sign in hpcxx account slips Windows Users: Download PuTTY Google PuTTY First result Save putty.exe to Desktop Research Computing at Virginia Tech Advanced Research Computing Compute Resources

More information

MATLAB on BioHPC. portal.biohpc.swmed.edu Updated for

MATLAB on BioHPC. portal.biohpc.swmed.edu Updated for MATLAB on BioHPC [web] [email] portal.biohpc.swmed.edu biohpc-help@utsouthwestern.edu 1 Updated for 2015-06-17 What is MATLAB High level language and development environment for: - Algorithm and application

More information

An Introduction to Gauss. Paul D. Baines University of California, Davis November 20 th 2012

An Introduction to Gauss. Paul D. Baines University of California, Davis November 20 th 2012 An Introduction to Gauss Paul D. Baines University of California, Davis November 20 th 2012 What is Gauss? * http://wiki.cse.ucdavis.edu/support:systems:gauss * 12 node compute cluster (2 x 16 cores per

More information

How to Use a Supercomputer - A Boot Camp

How to Use a Supercomputer - A Boot Camp How to Use a Supercomputer - A Boot Camp Shelley Knuth Peter Ruprecht shelley.knuth@colorado.edu peter.ruprecht@colorado.edu www.rc.colorado.edu Outline Today we will discuss: Who Research Computing is

More information

ROME Documentation

ROME Documentation ROME 1.1.0 Documentation Copyright 2017 Intel Parallel Computing Center for Structural Biology at Dana- Farber Cancer Institute, in association with Peking University. 1 How to install 1.1 Prerequisite

More information