Slurm. Ryan Cox Fulton Supercomputing Lab Brigham Young University (BYU)

Size: px
Start display at page:

Download "Slurm. Ryan Cox Fulton Supercomputing Lab Brigham Young University (BYU)"

Transcription

1 Slurm Ryan Cox Fulton Supercomputing Lab Brigham Young University (BYU)

2 Slurm Workload Manager What is Slurm? Installation Slurm Configuration Daemons Configuration Files Client Commands User and Account Management Policies, tuning, and advanced configuration Priorities Fairshare Backfill QOS 4-8 August

3 What is Slurm? Simple Linux Utility for Resource Management Anything but simple Resource manager and scheduler Originally developed at LLNL (Lawrence Livermore) GPL v2 Commercial support/development available Core development by SchedMD Other major contributors exist Built for scale and fault tolerance Plugin-based: lots of plugins to modify Slurm behavior for your needs 4-8 August

4 BYU's Scheduler History BYU has run several scheduling systems through its HPC history Moab/Torque was the primary scheduling system for many years Slurm replaced Moab/Torque as BYU's sole scheduler in January 2013 BYU has now contributed Slurm patches, some small and some large. Examples: New fair share algorithm: LEVEL_BASED cgroup out-of-memory notification in job output Script to generate a file equivalent to PBS_NODEFILE Optionally charge for CPU equivalents instead of just CPUs (WIP) 4-8 August

5 Terminology Partition A set of nodes (usually a cluster, using the traditional definition of cluster ) Cluster Multiple Slurm clusters can be managed by one slurmdbd; one slurmctld per cluster Job step a suballocation within a job E.g. Job 1234 has been allocated 12 nodes. It launches 4 job steps that each run on 3 of the nodes. Similar to subletting an apartment Sublet the whole place or just a room or two User A user ( Bob has a Slurm user : bob ) Account A group of users and subaccounts Association the combination of user, account, partition, and cluster A user can be members of multiple accounts with different limits for different partitions and clusters, etc. 4-8 August

6 Installation Version numbers are Ubuntu-style (e.g ) == major version released in March == minor version Download official releases from schedmd.com git repo available at Active development occurs at github.com; releases are tagged (git tag) Two main methods of installation./configure && make && make install # and install missing -dev{,el} packages Build RPMs, etc Some distros have a package, usually slurm-llnl Version may be behind by a major release or three If you want to patch something, this is the hardest approach 4-8 August

7 Installation: Build RPMs Set up ~/.rpmmacros with something like this (see top of slurm.spec for more options): ##slurm macros %_with_blcr 1 %_with_lua 1 %_with_mysql 1 %_with_openssl 1 %_smp_mflags -j16 ##%_prefix /usr/local/slurm Copy missing version info from META file to slurm.spec (grep for META in slurm.spec) Let's assume we add the following lines to slurm.spec: Name: slurm Version: Release: 0%{?dist}-custom1 Assuming RHEL 6, the RPM version will become: slurm el6-custom1 If the slurm code is in./slurm/, do: ln -s slurm slurm el6-custom1 tar hzcvf slurm el6-custom1.tgz slurm el6-custom1 rpmbuild -tb slurm el6-custom1.tgz The *.rpm files will be in ~/rpmbuild/rpms 4-8 August

8 Configuration: Daemons Daemons slurmctld controller that handles scheduling, communication with nodes, etc slurmdbd (optional) communicates with MySQL database slurmd runs on a compute node and launches jobs slurmstepd run by slurmd to launch a job step munged authenticates RPC calls ( Install munged everywhere with the same key slurmd hierarchical communication between slurmd instances (for scalability) slurmctld and slurmdbd can have primary and backup instances for HA State synchronized through shared file system (StateSaveLocation) 4-8 August

9 Configuration: Config Files Config files are read directly from the node by commands and daemons Config files should be kept in sync everywhere Exception slurmdbd.conf: only used by slurmdbd, contains database passwords DebugFlags=NO_CONF_HASH tell Slurm to tolerate some differences. Everything should be consistent except maybe backfill parameters, etc that slurmd doesn't need Can use Include /path/to/file.conf to separate out portions, e.g. partitions, nodes, licenses Can configure generic resources with GresTypes=gpu man slurm.conf Easy: Almost as easy: August

10 Configuration: Gotchas SlurmdTimeout The interval that slurmctld waits for slurmd to respond before assuming a node is dead and killing its jobs Set appropriately so file system disruptions and Slurm updates don't kill everything. Ours is 1800 (30 minutes). Slurm queries the hardware and configures nodes appropriately... may not be what you want if you want Mem=64GB instead of GB Can set FastSchedule=2 You probably want this: AccountingStorageEnforce=associations,limits,qos ulimit at the time of sbatch gets propagated to the job: set PropagateResourceLimits if you don't like that 4-8 August

11 Commands squeue view the queue sbatch submit a batch job salloc launch an interactive job srun two uses: outside of a job run a command through the scheduler on compute node(s) and print the output to stdout inside of a job launch a job step (i.e. suballocation) and print to the job's stdout sacct view job accounting information sacctmgr manage users and accounts including limits sstat view job step information (I rarely use) sreport view reports about usage (I rarely use) sinfo information on partitions and nodes scancel cancel jobs or steps, send arbitrary signals (INT, USR1, etc) scontrol list and update jobs, nodes, partitions, reservations, etc 4-8 August

12 Commands: Read the Manpages Slurm is too configurable to cover everything here I will share some examples in the next few slides New features are added frequently squeue now has more output options than A-z (printf style): new output formatting method added in August

13 Host Range Syntax Host range syntax is more compact, allows smaller RPC calls, easier to read config files, etc Node lists have a range syntax with [] using, and - Usable with commands and config files n[1-10,40-50] and n[5-20] are valid Up to two ranges are allowed: n[1-100]-[1-16] I haven't tried this out recently so it may have increased; manpage still says two Comma separated lists are allowed: a-[1-5]-[1-2],b-3-[1-16],b-[4-5]-[1-2,7,9] 4-8 August

14 Commands: squeue Want to see all running jobs on nodes n[4-31] submitted by all users in account accte using QOS special with a certain set of job names in reservation res8 but only show the job ID and the list of nodes the jobs are assigned then sort it by time remaining then descending by job ID? There's a command for that! squeue -t running -w n[4-31] -A accte -q special -n name1,name2 -R res8 -o "%.10i %N" -S +L,-i Way too many options to list here. Read the manpage. 4-8 August

15 Commands: sbatch (and salloc, srun) sbatch parses #SBATCH in a job script and accepts parameters on CLI Also parses most #PBS syntax salloc and srun accept most of the same options LOTS of options: read the manpage Easy way to learn/teach the syntax: BYU's Job Script Generator LGPL v3, Javascript, available on Github Slurm and PBS syntax May need modification by your site August

16 Script Generator (1/2) 4-8 August

17 Script Generator (2/2) Demo: Code: August

18 Commands: sbatch (and salloc, srun) Short and long versions exist for most options -N 2 # node count -n 8 # task count default behavior is to try loading up fewer nodes as much as possible rather than spreading tasks -t 2-04:30:00 # time limit in d-h:m:s, d-h, h:m:s, h:m, or m -p p1 # partition name(s): can list multiple partitions --qos=standby # QOS to use --mem=24g # memory per node --mem-per-cpu=2g # memory per CPU -a # job array 4-8 August

19 Job Arrays Used to submit homogeneous scripts that differ only by an index number $SLURM_ARRAY_TASK_ID stores the job's index number (from -a) An individual job looks like 1234_7 where ${SLURM_JOB_ID}_${SLURM_ARRAY_TASK_ID} scancel 1234 for the whole array or scancel 1234_7 for just one job in the array Prior to Job arrays are purely for convenience One sbatch call, scancel can work on the entire array, etc Internally, one job entry created for each job array entry at submit time Overhead of job array w/1000 tasks is about equivalent to 1000 individual jobs Starting in Meta job is used internally Scheduling code is aware of the homogeneity of the array Individual job entries are created once a job is started Big performance advantage 4-8 August

20 Commands: scontrol scontrol can list, set and update a lot of different things scontrol show job $jobid scontrol show node $node scontrol show reservation # checkjob equiv scontrol <hold release> $jobid # hold/release ( uhold allows user to release) Update syntax: scontrol update JobID=1234 Timelimit=2-0 #set 1234 to a 2 day timelimit scontrol update NodeName=n-4-5 State=DOWN Reason= cosmic rays Create reservation: scontrol create reservation reservationname=testres nodes=n-[4,7-10] flags=maint,ignore_jobs,overlap starttime=now duration=2-0 users=root scontrol reconfigure #reread slurm.conf LOTS of other options: read the manpage 4-8 August

21 Resource Enforcement Slurm can enforce resource requests through the OS CPU task/cgroup uses cpuset cgroup (best) task/affinity pins a task using sched_setaffinity (good but a user can escape it) memory memory cgroup (best) polling (polling-based: huge race conditions exist, but much better than nothing; users can escape it) 4-8 August

22 QOS A QOS can be used to: Modify job priorities based on QOS priority Configure preemption Allow access to dedicated resources Override or impose limits Change charge rate (a.k.a. UsageFactor) A QOS can have limits: per QOS and per user per QOS List existing QOS with: sacctmgr list qos Modify: sacctmgr modify qos long set MaxWall=14-0 UsageFactor= August

23 QOS: Preemption Preemption is easy to configure: sacctmgr modify qos normal set preempt=standby You can set up the following sacctmgr modify qos high set preempt=normal,low sacctmgr modify qos normal set preempt=low GraceTime (optional) guarantees a minimum runtime for preempted jobs Use AllowQOS to specify which QOS is allowed to run in each partition If userbob owns all the nodes in partition bobpartition : In slurm.conf, set AllowQOS=bobqos,standby in partition bobpartition sacctmgr modify user userbob set qos+=bobqos sacctmgr modify qos bobqos set preempt=standby 4-8 August

24 User/Account Management sacctmgr load/dump can be used as a poor way to implement user management Proper use of sacctmgr in a transactional manner is better and allows more flexibility, though you'll need to integrate it with your account creation process, etc A user can be a member of multiple accounts Default account can be specified with sacctmgr (DefaultAccount) Fairshare Shares can be set to favor/penalize certain users Can grant/revoke access to multiple QOS's sacctmgr list assoc user=userbob sacctmgr list assoc user=userbob account=prof7 Filter by user and account sacctmgr create user userbob Accounts=prof2 DefaultAccount=prof2 Fairshare= August

25 User Limits Limits of CPUs, memory, nodes, timelimits, allocated cpus*time, etc. can be set on an association sacctmgr modify user userbob set GrpCPUs=1024 sacctmgr modify account prof7 set GrpCPUs=2048 Account Grp* limits are a limit for the entire account (sum of children) Max* limits are usually per user or per job Set a limit to -1 to remove it 4-8 August

26 User Limits: GrpCPURunMins GrpCPURunMins is a limit on the sum of an association's jobs' (allocated CPUs * time_remaining) Similar to MAXPS in Moab/Maui Staggers the start time of jobs Allows more jobs to start as other jobs near completion Simulator available: Download for your own site (LGPL v3): More info about why we use this: August

27 GrpCPURunMins: 1 core job, 7 days, limit= GrpCPURunMins: 1 core job, 3 days, limit= August

28 Account Coordinator An account coordinator can do the following for users and subaccounts under the account: Set limits (CPUs, nodes, walltime, etc.) Modify fairshare Shares to favor/penalize certain users Grant/revoke access to a QOS * Hold and cancel jobs We set faculty to be account coordinators for their accounts End-user documentation: *Any QOS: August

29 Allocation Management BYU does not use, therefore I don't know much about it GrpCPUMins (different than GrpCPURunMins) GrpCPUMins - The total number of CPU minutes that can possibly be used by past, present and future jobs running from this association and its children. Can be reset manually or periodically. See PriorityUsageResetPeriod A QOS can have a UsageFactor that makes it so you get billed more or less depending on QOS: 5.0 for immediate, 1.0 for normal, 0.1 for standby 4-8 August

30 Job Priorities priority/multifactor plugin uses weights * values priority = sum(configured_weight_int * actual_value_float) Weights are integers and the values themselves are floats ( ) Available components Age (queue wait time) Fairshare JobSize Partition QOS 4-8 August

31 Job Priorities: Example Let's say the weights are: PriorityWeightAge=0 PriorityWeightFairshare=10000 (ten thousand) PriorityWeightJobSize=0 PriorityWeightPartition=0 PriorityWeightQOS=10000 (ten thousand) QOS Priorities are: high=5 normal=2 low=0 userbob (fairshare=0.23) submits a job in qos normal (qos_priority=2): priority = (PriorityWeightFairshare *.23) + (PriorityWeightQOS * 2 / MAX(qos_priority))) priority = (10000 *.23) + (10000 * (2/5)) = August

32 Backfill Can be tuned with SchedulerParameters in slurm.conf Example: SchedulerParameters=bf_max_job_user=20,bf_interval=60,defaul t_queue_depth=15,max_job_bf=8000,bf_window=14400,bf_conti nue,max_sched_time=6,bf_resolution=1800,defer Goal: Only backfill a job if it will not delay the start time of any higher priority job So many nice tuning parameters pop up all the time that I can't keep up. See the slurm.conf manpage for SchedulerParameters options 4-8 August

33 Fairshare Algorithms Warning: Sites have widely varying use cases so I don't necessarily understand the reason for some of the algorithms priority/multifactor plugin can use different fair share algorithms Default (no algorithm override specified with PriorityFlags) Fairshare factor affected by you vs your siblings, your parent versus its siblings, your grandparent versus its siblings, etc. FSFactor=2**(-Usage/Shares). Seems to be the most common but it doesn't work for us PriorityFlags=DEPTH_OBLIVIOUS Improves handling of deep and/or unbalanced trees PriorityFlags=TICKET_BASED We used it for a while and it mostly worked but the algorithm itself is flawed LEVEL_BASED recommended as a replacement PriorityFlags=LEVEL_BASED Users in an under-served account will always have a higher fair share factor than users in an over-served account. E.g. Account hogs has higher usage than account idle. All users in idle will have a higher FS factor than all users in hogs Available in through github.com/byuhpc/slurm. Used in production at BYU Available through upstream in (as of pre3) 4-8 August

34 Job Submit Plugin Slurm can run a job submit plugin written in Lua Lua looks like pseudo-code and doesn't take long to learn The plugin can modify a job's submission based on whatever business logic you want Example uses: Allow access to a partition based on the requested CPU count being a multiple of 3 Change the QOS to something different based on different factors Output a custom error message, such as Error! You requested x, y, and z but... True business logic is possible with this script. It is worth your time to look 4-8 August

35 Other Stuff Check out SPANK plugins: runs on a node and can do lots of stuff for job start/end events Prolog and epilog are available in lots of different ways (job, step, task) 4-8 August

36 User Education Slurm (mostly) speaks #PBS and has many wrapper scripts Maybe this is sufficient? BYU switched from Moab/Torque to Slurm before notifying users of the change (Yes, we are that crazy. Yes, it worked great for >95% of use cases, which was our target. The other options/commands were esoteric and silently ignored by Moab/Torque anyway) Slurm/PBS Script Generator available: github.com/byuhpc LGPL v3, demo linked to from github Introduction to Slurm Tools video is linked from there August

37 Diagnostics Backtraces from core dumps are typically best for crashes Be sure you don't have any ulimit-type restrictions on them For slurmctld: gdb `which slurmctld` /var/log/slurm/core thread apply all bt SchedMD is usually able to diagnose problems from backtraces and maybe a few extra print statements they'll ask for Each component has its own logging level you can specify in.conf There are extra flags for slurmctld: scontrol setdebugflags +backfill #and others like Priority 4-8 August

38 User Education Slurm (mostly) speaks #PBS and has many wrapper scripts Maybe this is sufficient? BYU switched from Moab/Torque to Slurm before notifying users of the change (Yes, we are that crazy. Yes, it worked great for >95% of use cases, which was our target. The other options/commands were esoteric and silently ignored by Moab/Torque anyway) Slurm/PBS Script Generator available: github.com/byuhpc LGPL v3, demo linked to from github Introduction to Slurm Tools video is linked from there August

39 Support SchedMD Excellent support from the original developers Bugfixes typically committed to github within a day Other support vendors listed on Slurm's Wikipedia page Usually tied to a specific hardware vendor or as part of a larger software installation slurm-dev mailing list ( You should subscribe Hand holding is extremely rare Don't expect to use slurm-dev for support 4-8 August

40 Recommendations Requirements documents Don't have your primary scheduler admin write it unless the admin can step back and write what you actually need rather than must have features A, B, and C exactly [even though Slurm may have a better way of accomplishing the same thing] Think: I want Prof Bob and his designated favorite students to have access to his privately owned hardware but also want preemptable jobs to run on there when they aren't. He shouldn't get charged cputime for using his own resources. How should I do that in Slurm? Set AllowQOS=profbob,standby on his partition in slurm.conf sacctmgr create qos profbob UsageFactor=0 Then add each user who should have access to the QOS: `sacctmgr modify user $user set qos+=profbob` 4-8 August

41 Questions? 4-8 August

Brigham Young University

Brigham Young University Brigham Young University Fulton Supercomputing Lab Ryan Cox Slurm User Group September 16, 2015 Washington, D.C. Open Source Code I'll reference several codes we have open sourced http://github.com/byuhpc

More information

Slurm Overview. Brian Christiansen, Marshall Garey, Isaac Hartung SchedMD SC17. Copyright 2017 SchedMD LLC

Slurm Overview. Brian Christiansen, Marshall Garey, Isaac Hartung SchedMD SC17. Copyright 2017 SchedMD LLC Slurm Overview Brian Christiansen, Marshall Garey, Isaac Hartung SchedMD SC17 Outline Roles of a resource manager and job scheduler Slurm description and design goals Slurm architecture and plugins Slurm

More information

Introduction to Slurm

Introduction to Slurm Introduction to Slurm Tim Wickberg SchedMD Slurm User Group Meeting 2017 Outline Roles of resource manager and job scheduler Slurm description and design goals Slurm architecture and plugins Slurm configuration

More information

Linux Clusters Institute: Scheduling and Resource Management. Brian Haymore, Sr IT Arch - HPC, University of Utah May 2017

Linux Clusters Institute: Scheduling and Resource Management. Brian Haymore, Sr IT Arch - HPC, University of Utah May 2017 Linux Clusters Institute: Scheduling and Resource Management Brian Haymore, Sr IT Arch - HPC, University of Utah May 2017 This content was originally developed by Ryan Cox (2014) and updated by Brian Haymore

More information

1 Bull, 2011 Bull Extreme Computing

1 Bull, 2011 Bull Extreme Computing 1 Bull, 2011 Bull Extreme Computing Table of Contents Overview. Principal concepts. Architecture. Scheduler Policies. 2 Bull, 2011 Bull Extreme Computing SLURM Overview Ares, Gerardo, HPC Team Introduction

More information

Slurm Birds of a Feather

Slurm Birds of a Feather Slurm Birds of a Feather Tim Wickberg SchedMD SC17 Outline Welcome Roadmap Review of 17.02 release (Februrary 2017) Overview of upcoming 17.11 (November 2017) release Roadmap for 18.08 and beyond Time

More information

Brigham Young University Fulton Supercomputing Lab. Ryan Cox

Brigham Young University Fulton Supercomputing Lab. Ryan Cox Brigham Young University Fulton Supercomputing Lab Ryan Cox SLURM User Group 2013 Fun Facts ~33,000 students ~70% of students speak a foreign language Several cities around BYU have gige at home #6 Top

More information

Linux Clusters Institute: Scheduling

Linux Clusters Institute: Scheduling Linux Clusters Institute: Scheduling David King, Sr. HPC Engineer National Center for Supercomputing Applications University of Illinois August 2017 1 About me Worked in HPC since 2007 Started at Purdue

More information

Slurm Roadmap. Danny Auble, Morris Jette, Tim Wickberg SchedMD. Slurm User Group Meeting Copyright 2017 SchedMD LLC https://www.schedmd.

Slurm Roadmap. Danny Auble, Morris Jette, Tim Wickberg SchedMD. Slurm User Group Meeting Copyright 2017 SchedMD LLC https://www.schedmd. Slurm Roadmap Danny Auble, Morris Jette, Tim Wickberg SchedMD Slurm User Group Meeting 2017 HPCWire apparently does awards? Best HPC Cluster Solution or Technology https://www.hpcwire.com/2017-annual-hpcwire-readers-choice-awards/

More information

High Scalability Resource Management with SLURM Supercomputing 2008 November 2008

High Scalability Resource Management with SLURM Supercomputing 2008 November 2008 High Scalability Resource Management with SLURM Supercomputing 2008 November 2008 Morris Jette (jette1@llnl.gov) LLNL-PRES-408498 Lawrence Livermore National Laboratory What is SLURM Simple Linux Utility

More information

Case study of a computing center: Accounts, Priorities and Quotas

Case study of a computing center: Accounts, Priorities and Quotas Afficher le masque pour Insérer le titre ici Direction Informatique 05/02/2015 Case study of a computing center: Accounts, Priorities and Quotas Michel Ringenbach mir@unistra.fr HPC Center, Université

More information

Slurm Workload Manager Introductory User Training

Slurm Workload Manager Introductory User Training Slurm Workload Manager Introductory User Training David Bigagli david@schedmd.com SchedMD LLC Outline Roles of resource manager and job scheduler Slurm design and architecture Submitting and running jobs

More information

Slurm Version Overview

Slurm Version Overview Slurm Version 18.08 Overview Brian Christiansen SchedMD Slurm User Group Meeting 2018 Schedule Previous major release was 17.11 (November 2017) Latest major release 18.08 (August 2018) Next major release

More information

Federated Cluster Support

Federated Cluster Support Federated Cluster Support Brian Christiansen and Morris Jette SchedMD LLC Slurm User Group Meeting 2015 Background Slurm has long had limited support for federated clusters Most commands support a --cluster

More information

Heterogeneous Job Support

Heterogeneous Job Support Heterogeneous Job Support Tim Wickberg SchedMD SC17 Submitting Jobs Multiple independent job specifications identified in command line using : separator The job specifications are sent to slurmctld daemon

More information

Resource Management at LLNL SLURM Version 1.2

Resource Management at LLNL SLURM Version 1.2 UCRL PRES 230170 Resource Management at LLNL SLURM Version 1.2 April 2007 Morris Jette (jette1@llnl.gov) Danny Auble (auble1@llnl.gov) Chris Morrone (morrone2@llnl.gov) Lawrence Livermore National Laboratory

More information

Slurm basics. Summer Kickstart June slide 1 of 49

Slurm basics. Summer Kickstart June slide 1 of 49 Slurm basics Summer Kickstart 2017 June 2017 slide 1 of 49 Triton layers Triton is a powerful but complex machine. You have to consider: Connecting (ssh) Data storage (filesystems and Lustre) Resource

More information

CNAG Advanced User Training

CNAG Advanced User Training www.bsc.es CNAG Advanced User Training Aníbal Moreno, CNAG System Administrator Pablo Ródenas, BSC HPC Support Rubén Ramos Horta, CNAG HPC Support Barcelona,May the 5th Aim Understand CNAG s cluster design

More information

SLURM Operation on Cray XT and XE

SLURM Operation on Cray XT and XE SLURM Operation on Cray XT and XE Morris Jette jette@schedmd.com Contributors and Collaborators This work was supported by the Oak Ridge National Laboratory Extreme Scale Systems Center. Swiss National

More information

Batch Usage on JURECA Introduction to Slurm. May 2016 Chrysovalantis Paschoulas HPS JSC

Batch Usage on JURECA Introduction to Slurm. May 2016 Chrysovalantis Paschoulas HPS JSC Batch Usage on JURECA Introduction to Slurm May 2016 Chrysovalantis Paschoulas HPS group @ JSC Batch System Concepts Resource Manager is the software responsible for managing the resources of a cluster,

More information

Introduction to High-Performance Computing (HPC)

Introduction to High-Performance Computing (HPC) Introduction to High-Performance Computing (HPC) Computer components CPU : Central Processing Unit cores : individual processing units within a CPU Storage : Disk drives HDD : Hard Disk Drive SSD : Solid

More information

Slurm at the George Washington University Tim Wickberg - Slurm User Group Meeting 2015

Slurm at the George Washington University Tim Wickberg - Slurm User Group Meeting 2015 Slurm at the George Washington University Tim Wickberg - wickberg@gwu.edu Slurm User Group Meeting 2015 September 16, 2015 Colonial One What s new? Only major change was switch to FairTree Thanks to BYU

More information

High Performance Computing Cluster Advanced course

High Performance Computing Cluster Advanced course High Performance Computing Cluster Advanced course Jeremie Vandenplas, Gwen Dawes 9 November 2017 Outline Introduction to the Agrogenomics HPC Submitting and monitoring jobs on the HPC Parallel jobs on

More information

Versions and 14.11

Versions and 14.11 Slurm Update Versions 14.03 and 14.11 Jacob Jenson jacob@schedmd.com Yiannis Georgiou yiannis.georgiou@bull.net V14.03 - Highlights Support for native Slurm operation on Cray systems (without ALPS) Run

More information

Introduction to RCC. January 18, 2017 Research Computing Center

Introduction to RCC. January 18, 2017 Research Computing Center Introduction to HPC @ RCC January 18, 2017 Research Computing Center What is HPC High Performance Computing most generally refers to the practice of aggregating computing power in a way that delivers much

More information

SLURM: Resource Management and Job Scheduling Software. Advanced Computing Center for Research and Education

SLURM: Resource Management and Job Scheduling Software. Advanced Computing Center for Research and Education SLURM: Resource Management and Job Scheduling Software Advanced Computing Center for Research and Education www.accre.vanderbilt.edu Simple Linux Utility for Resource Management But it s also a job scheduler!

More information

Introduction to RCC. September 14, 2016 Research Computing Center

Introduction to RCC. September 14, 2016 Research Computing Center Introduction to HPC @ RCC September 14, 2016 Research Computing Center What is HPC High Performance Computing most generally refers to the practice of aggregating computing power in a way that delivers

More information

High Performance Computing Cluster Basic course

High Performance Computing Cluster Basic course High Performance Computing Cluster Basic course Jeremie Vandenplas, Gwen Dawes 30 October 2017 Outline Introduction to the Agrogenomics HPC Connecting with Secure Shell to the HPC Introduction to the Unix/Linux

More information

Introduction to SLURM on the High Performance Cluster at the Center for Computational Research

Introduction to SLURM on the High Performance Cluster at the Center for Computational Research Introduction to SLURM on the High Performance Cluster at the Center for Computational Research Cynthia Cornelius Center for Computational Research University at Buffalo, SUNY 701 Ellicott St Buffalo, NY

More information

NERSC Site Report One year of Slurm Douglas Jacobsen NERSC. SLURM User Group 2016

NERSC Site Report One year of Slurm Douglas Jacobsen NERSC. SLURM User Group 2016 NERSC Site Report One year of Slurm Douglas Jacobsen NERSC SLURM User Group 2016 NERSC Vital Statistics 860 active projects 7,750 active users 700+ codes both established and in-development migrated production

More information

Submitting batch jobs

Submitting batch jobs Submitting batch jobs SLURM on ECGATE Xavi Abellan Xavier.Abellan@ecmwf.int ECMWF February 20, 2017 Outline Interactive mode versus Batch mode Overview of the Slurm batch system on ecgate Batch basic concepts

More information

High Throughput Computing with SLURM. SLURM User Group Meeting October 9-10, 2012 Barcelona, Spain

High Throughput Computing with SLURM. SLURM User Group Meeting October 9-10, 2012 Barcelona, Spain High Throughput Computing with SLURM SLURM User Group Meeting October 9-10, 2012 Barcelona, Spain Morris Jette and Danny Auble [jette,da]@schedmd.com Thanks to This work is supported by the Oak Ridge National

More information

Introduction to SLURM & SLURM batch scripts

Introduction to SLURM & SLURM batch scripts Introduction to SLURM & SLURM batch scripts Anita Orendt Assistant Director Research Consulting & Faculty Engagement anita.orendt@utah.edu 6 February 2018 Overview of Talk Basic SLURM commands SLURM batch

More information

Sherlock for IBIIS. William Law Stanford Research Computing

Sherlock for IBIIS. William Law Stanford Research Computing Sherlock for IBIIS William Law Stanford Research Computing Overview How we can help System overview Tech specs Signing on Batch submission Software environment Interactive jobs Next steps We are here to

More information

SLURM Administrators Tutorial

SLURM Administrators Tutorial SLURM Administrators Tutorial 20/01/15 Yiannis Georgiou Resource Management Systems Architect Bull, 2012 1 Introduction SLURM scalable and flexible RJMS Part 1: Basics Overview, Architecture, Configuration

More information

RHRK-Seminar. High Performance Computing with the Cluster Elwetritsch - II. Course instructor : Dr. Josef Schüle, RHRK

RHRK-Seminar. High Performance Computing with the Cluster Elwetritsch - II. Course instructor : Dr. Josef Schüle, RHRK RHRK-Seminar High Performance Computing with the Cluster Elwetritsch - II Course instructor : Dr. Josef Schüle, RHRK Overview Course I Login to cluster SSH RDP / NX Desktop Environments GNOME (default)

More information

Directions in Workload Management

Directions in Workload Management Directions in Workload Management Alex Sanchez and Morris Jette SchedMD LLC HPC Knowledge Meeting 2016 Areas of Focus Scalability Large Node and Core Counts Power Management Failure Management Federated

More information

Submitting and running jobs on PlaFRIM2 Redouane Bouchouirbat

Submitting and running jobs on PlaFRIM2 Redouane Bouchouirbat Submitting and running jobs on PlaFRIM2 Redouane Bouchouirbat Summary 1. Submitting Jobs: Batch mode - Interactive mode 2. Partition 3. Jobs: Serial, Parallel 4. Using generic resources Gres : GPUs, MICs.

More information

CEA Site Report. SLURM User Group Meeting 2012 Matthieu Hautreux 26 septembre 2012 CEA 10 AVRIL 2012 PAGE 1

CEA Site Report. SLURM User Group Meeting 2012 Matthieu Hautreux 26 septembre 2012 CEA 10 AVRIL 2012 PAGE 1 CEA Site Report SLURM User Group Meeting 2012 Matthieu Hautreux 26 septembre 2012 CEA 10 AVRIL 2012 PAGE 1 Agenda Supercomputing Projects SLURM usage SLURM related work SLURM

More information

June Workshop Series June 27th: All About SLURM University of Nebraska Lincoln Holland Computing Center. Carrie Brown, Adam Caprez

June Workshop Series June 27th: All About SLURM University of Nebraska Lincoln Holland Computing Center. Carrie Brown, Adam Caprez June Workshop Series June 27th: All About SLURM University of Nebraska Lincoln Holland Computing Center Carrie Brown, Adam Caprez Setup Instructions Please complete these steps before the lessons start

More information

Introduction to Joker Cyber Infrastructure Architecture Team CIA.NMSU.EDU

Introduction to Joker Cyber Infrastructure Architecture Team CIA.NMSU.EDU Introduction to Joker Cyber Infrastructure Architecture Team CIA.NMSU.EDU What is Joker? NMSU s supercomputer. 238 core computer cluster. Intel E-5 Xeon CPUs and Nvidia K-40 GPUs. InfiniBand innerconnect.

More information

Exercises: Abel/Colossus and SLURM

Exercises: Abel/Colossus and SLURM Exercises: Abel/Colossus and SLURM November 08, 2016 Sabry Razick The Research Computing Services Group, USIT Topics Get access Running a simple job Job script Running a simple job -- qlogin Customize

More information

Duke Compute Cluster Workshop. 3/28/2018 Tom Milledge rc.duke.edu

Duke Compute Cluster Workshop. 3/28/2018 Tom Milledge rc.duke.edu Duke Compute Cluster Workshop 3/28/2018 Tom Milledge rc.duke.edu rescomputing@duke.edu Outline of talk Overview of Research Computing resources Duke Compute Cluster overview Running interactive and batch

More information

Hosts & Partitions. Slurm Training 15. Jordi Blasco & Alfred Gil (HPCNow!)

Hosts & Partitions. Slurm Training 15. Jordi Blasco & Alfred Gil (HPCNow!) Slurm Training 15 Agenda 1 2 Compute Hosts State of the node FrontEnd Hosts FrontEnd Hosts Control Machine Define Partitions Job Preemption 3 4 Define Limits Define ACLs Shared resources Partition States

More information

ECE 574 Cluster Computing Lecture 4

ECE 574 Cluster Computing Lecture 4 ECE 574 Cluster Computing Lecture 4 Vince Weaver http://web.eece.maine.edu/~vweaver vincent.weaver@maine.edu 31 January 2017 Announcements Don t forget about homework #3 I ran HPCG benchmark on Haswell-EP

More information

cli_filter command line filtration, manipulation, and introspection of job submissions

cli_filter command line filtration, manipulation, and introspection of job submissions cli_filter command line filtration, manipulation, and introspection of job submissions Douglas Jacobsen Systems Software Engineer, NERSC Slurm User Group * 2017/09/25 What is cli_filter cli_filter is a

More information

How to run a job on a Cluster?

How to run a job on a Cluster? How to run a job on a Cluster? Cluster Training Workshop Dr Samuel Kortas Computational Scientist KAUST Supercomputing Laboratory Samuel.kortas@kaust.edu.sa 17 October 2017 Outline 1. Resources available

More information

Slurm Workload Manager Overview SC15

Slurm Workload Manager Overview SC15 Slurm Workload Manager Overview SC15 Alejandro Sanchez alex@schedmd.com Slurm Workload Manager Overview Originally intended as simple resource manager, but has evolved into sophisticated batch scheduler

More information

Slurm Burst Buffer Support

Slurm Burst Buffer Support Slurm Burst Buffer Support Tim Wickberg (SchedMD LLC) SC15 Burst Buffer Overview A cluster-wide high-performance storage resource Burst buffer (BB) support added Slurm version 15.08 Two types of BB allocations:

More information

Slurm Roadmap. Morris Jette, Danny Auble (SchedMD) Yiannis Georgiou (Bull)

Slurm Roadmap. Morris Jette, Danny Auble (SchedMD) Yiannis Georgiou (Bull) Slurm Roadmap Morris Jette, Danny Auble (SchedMD) Yiannis Georgiou (Bull) Exascale Focus Heterogeneous Environment Scalability Reliability Energy Efficiency New models (Cloud/Virtualization/Hadoop) Following

More information

INTRODUCTION TO GPU COMPUTING WITH CUDA. Topi Siro

INTRODUCTION TO GPU COMPUTING WITH CUDA. Topi Siro INTRODUCTION TO GPU COMPUTING WITH CUDA Topi Siro 19.10.2015 OUTLINE PART I - Tue 20.10 10-12 What is GPU computing? What is CUDA? Running GPU jobs on Triton PART II - Thu 22.10 10-12 Using libraries Different

More information

Duke Compute Cluster Workshop. 11/10/2016 Tom Milledge h:ps://rc.duke.edu/

Duke Compute Cluster Workshop. 11/10/2016 Tom Milledge h:ps://rc.duke.edu/ Duke Compute Cluster Workshop 11/10/2016 Tom Milledge h:ps://rc.duke.edu/ rescompu>ng@duke.edu Outline of talk Overview of Research Compu>ng resources Duke Compute Cluster overview Running interac>ve and

More information

Slurm at UPPMAX. How to submit jobs with our queueing system. Jessica Nettelblad sysadmin at UPPMAX

Slurm at UPPMAX. How to submit jobs with our queueing system. Jessica Nettelblad sysadmin at UPPMAX Slurm at UPPMAX How to submit jobs with our queueing system Jessica Nettelblad sysadmin at UPPMAX Slurm at UPPMAX Intro Queueing with Slurm How to submit jobs Testing How to test your scripts before submission

More information

A declarative programming style job submission filter.

A declarative programming style job submission filter. A declarative programming style job submission filter. Douglas Jacobsen Computational Systems Group Lead NERSC -1- Slurm User Group 2018 NERSC Vital Statistics 860 projects 7750 users Edison NERSC-7 Cray

More information

Batch Systems & Parallel Application Launchers Running your jobs on an HPC machine

Batch Systems & Parallel Application Launchers Running your jobs on an HPC machine Batch Systems & Parallel Application Launchers Running your jobs on an HPC machine Partners Funding Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike

More information

Slurm Support for Linux Control Groups

Slurm Support for Linux Control Groups Slurm Support for Linux Control Groups Slurm User Group 2010, Paris, France, Oct 5 th 2010 Martin Perry Bull Information Systems Phoenix, Arizona martin.perry@bull.com cgroups Concepts Control Groups (cgroups)

More information

From Moab to Slurm: 12 HPC Systems in 2 Months. Peltz, Fullop, Jennings, Senator, Grunau

From Moab to Slurm: 12 HPC Systems in 2 Months. Peltz, Fullop, Jennings, Senator, Grunau From Moab to Slurm: 12 HPC Systems in 2 Months Peltz, Fullop, Jennings, Senator, Grunau Tuesday, 26 September 2017 Where we started Multiple systems with various operating systems and architectures Moab

More information

Scheduler Optimization for Current Generation Cray Systems

Scheduler Optimization for Current Generation Cray Systems Scheduler Optimization for Current Generation Cray Systems Morris Jette SchedMD, jette@schedmd.com Douglas M. Jacobsen, David Paul NERSC, dmjacobsen@lbl.gov, dpaul@lbl.gov Abstract - The current generation

More information

Introduction to High-Performance Computing (HPC)

Introduction to High-Performance Computing (HPC) Introduction to High-Performance Computing (HPC) Computer components CPU : Central Processing Unit cores : individual processing units within a CPU Storage : Disk drives HDD : Hard Disk Drive SSD : Solid

More information

Introduction to PICO Parallel & Production Enviroment

Introduction to PICO Parallel & Production Enviroment Introduction to PICO Parallel & Production Enviroment Mirko Cestari m.cestari@cineca.it Alessandro Marani a.marani@cineca.it Domenico Guida d.guida@cineca.it Nicola Spallanzani n.spallanzani@cineca.it

More information

Moab Workload Manager on Cray XT3

Moab Workload Manager on Cray XT3 Moab Workload Manager on Cray XT3 presented by Don Maxwell (ORNL) Michael Jackson (Cluster Resources, Inc.) MOAB Workload Manager on Cray XT3 Why MOAB? Requirements Features Support/Futures 2 Why Moab?

More information

Using a Linux System 6

Using a Linux System 6 Canaan User Guide Connecting to the Cluster 1 SSH (Secure Shell) 1 Starting an ssh session from a Mac or Linux system 1 Starting an ssh session from a Windows PC 1 Once you're connected... 1 Ending an

More information

SLURM Simulator improvements and evaluation

SLURM Simulator improvements and evaluation SLURM Simulator improvements and evaluation Marco D Amico Ana Jokanovic Julita Corbalan SLUG 18 Introduction SLURM Simulator is able to simulate workloads execution Why not just a simulator? It keeps code

More information

Scheduling By Trackable Resources

Scheduling By Trackable Resources Scheduling By Trackable Resources Morris Jette and Dominik Bartkiewicz SchedMD Slurm User Group Meeting 2018 Thanks to NVIDIA for sponsoring this work Goals More flexible scheduling mechanism Especially

More information

Queuing and Scheduling on Compute Clusters

Queuing and Scheduling on Compute Clusters Queuing and Scheduling on Compute Clusters Andrew Caird acaird@umich.edu Queuing and Scheduling on Compute Clusters p.1/17 The reason for me being here Give some queuing background Introduce some queuing

More information

UPPMAX Introduction Martin Dahlö Valentin Georgiev

UPPMAX Introduction Martin Dahlö Valentin Georgiev UPPMAX Introduction 2017-11-27 Martin Dahlö martin.dahlo@scilifelab.uu.se Valentin Georgiev valentin.georgiev@icm.uu.se Objectives What is UPPMAX what it provides Projects at UPPMAX How to access UPPMAX

More information

Resource Management using SLURM

Resource Management using SLURM Resource Management using SLURM The 7 th International Conference on Linux Clusters University of Oklahoma May 1, 2006 Morris Jette (jette1@llnl.gov) Lawrence Livermore National Laboratory http://www.llnl.gov/linux/slurm

More information

To connect to the cluster, simply use a SSH or SFTP client to connect to:

To connect to the cluster, simply use a SSH or SFTP client to connect to: RIT Computer Engineering Cluster The RIT Computer Engineering cluster contains 12 computers for parallel programming using MPI. One computer, cluster-head.ce.rit.edu, serves as the master controller or

More information

Submitting batch jobs Slurm on ecgate

Submitting batch jobs Slurm on ecgate Submitting batch jobs Slurm on ecgate Xavi Abellan xavier.abellan@ecmwf.int User Support Section Com Intro 2015 Submitting batch jobs ECMWF 2015 Slide 1 Outline Interactive mode versus Batch mode Overview

More information

SLURM. User's Guide. ITS Research Computing Northeastern University Nilay K Roy, PhD

SLURM. User's Guide. ITS Research Computing Northeastern University Nilay K Roy, PhD SLURM User's Guide ITS Research Computing Northeastern University Nilay K Roy, PhD Table of Contents Chapter 1. SLURM Overview... 1 1.1 SLURM Key Functions... 1 1.2 SLURM Components... 2 1.3 SLURM Daemons...

More information

Introduction to GALILEO

Introduction to GALILEO Introduction to GALILEO Parallel & production environment Mirko Cestari m.cestari@cineca.it Alessandro Marani a.marani@cineca.it Domenico Guida d.guida@cineca.it Maurizio Cremonesi m.cremonesi@cineca.it

More information

AZURE CONTAINER INSTANCES

AZURE CONTAINER INSTANCES AZURE CONTAINER INSTANCES -Krunal Trivedi ABSTRACT In this article, I am going to explain what are Azure Container Instances, how you can use them for hosting, when you can use them and what are its features.

More information

Slurm Inter-Cluster Project. Stephen Trofinoff CSCS Via Trevano 131 CH-6900 Lugano 24-September-2014

Slurm Inter-Cluster Project. Stephen Trofinoff CSCS Via Trevano 131 CH-6900 Lugano 24-September-2014 Slurm Inter-Cluster Project Stephen Trofinoff CSCS Via Trevano 131 CH-6900 Lugano 24-September-2014 Definition Functionality pertaining to operations spanning different clusters is what this project refers

More information

PBS PROFESSIONAL VS. MICROSOFT HPC PACK

PBS PROFESSIONAL VS. MICROSOFT HPC PACK PBS PROFESSIONAL VS. MICROSOFT HPC PACK On the Microsoft Windows Platform PBS Professional offers many features which are not supported by Microsoft HPC Pack. SOME OF THE IMPORTANT ADVANTAGES OF PBS PROFESSIONAL

More information

Good to Great: Choosing NetworkComputer over Slurm

Good to Great: Choosing NetworkComputer over Slurm Good to Great: Choosing NetworkComputer over Slurm NetworkComputer White Paper 2560 Mission College Blvd., Suite 130 Santa Clara, CA 95054 (408) 492-0940 Introduction Are you considering Slurm as your

More information

Introduction to SLURM & SLURM batch scripts

Introduction to SLURM & SLURM batch scripts Introduction to SLURM & SLURM batch scripts Anita Orendt Assistant Director Research Consulting & Faculty Engagement anita.orendt@utah.edu 16 Feb 2017 Overview of Talk Basic SLURM commands SLURM batch

More information

Introduction to SLURM & SLURM batch scripts

Introduction to SLURM & SLURM batch scripts Introduction to SLURM & SLURM batch scripts Anita Orendt Assistant Director Research Consulting & Faculty Engagement anita.orendt@utah.edu 23 June 2016 Overview of Talk Basic SLURM commands SLURM batch

More information

Duke Compute Cluster Workshop. 10/04/2018 Tom Milledge rc.duke.edu

Duke Compute Cluster Workshop. 10/04/2018 Tom Milledge rc.duke.edu Duke Compute Cluster Workshop 10/04/2018 Tom Milledge rc.duke.edu rescomputing@duke.edu Outline of talk Overview of Research Computing resources Duke Compute Cluster overview Running interactive and batch

More information

Using and Modifying the BSC Slurm Workload Simulator. Slurm User Group Meeting 2015 Stephen Trofinoff and Massimo Benini, CSCS September 16, 2015

Using and Modifying the BSC Slurm Workload Simulator. Slurm User Group Meeting 2015 Stephen Trofinoff and Massimo Benini, CSCS September 16, 2015 Using and Modifying the BSC Slurm Workload Simulator Slurm User Group Meeting 2015 Stephen Trofinoff and Massimo Benini, CSCS September 16, 2015 Using and Modifying the BSC Slurm Workload Simulator The

More information

Exascale Process Management Interface

Exascale Process Management Interface Exascale Process Management Interface Ralph Castain Intel Corporation rhc@open-mpi.org Joshua S. Ladd Mellanox Technologies Inc. joshual@mellanox.com Artem Y. Polyakov Mellanox Technologies Inc. artemp@mellanox.com

More information

Transient Compute ARC as Cloud Front-End

Transient Compute ARC as Cloud Front-End Digital Infrastructures for Research 2016 2016-09-29, 11:30, Cracow 30 min slot AEC ALBERT EINSTEIN CENTER FOR FUNDAMENTAL PHYSICS Transient Compute ARC as Cloud Front-End Sigve Haug, AEC-LHEP University

More information

Choosing Resources Wisely. What is Research Computing?

Choosing Resources Wisely. What is Research Computing? Choosing Resources Wisely Scott Yockel, PhD Harvard - Research Computing What is Research Computing? Faculty of Arts and Sciences (FAS) department that handles nonenterprise IT requests from researchers.

More information

CYFRONET SITE REPORT IMPROVING SLURM USABILITY AND MONITORING. M. Pawlik, J. Budzowski, L. Flis, P. Lasoń, M. Magryś

CYFRONET SITE REPORT IMPROVING SLURM USABILITY AND MONITORING. M. Pawlik, J. Budzowski, L. Flis, P. Lasoń, M. Magryś CYFRONET SITE REPORT IMPROVING SLURM USABILITY AND MONITORING M. Pawlik, J. Budzowski, L. Flis, P. Lasoń, M. Magryś Presentation plan 2 Cyfronet introduction System description SLURM modifications Job

More information

SLURM Workload and Resource Management in HPC

SLURM Workload and Resource Management in HPC SLURM Workload and Resource Management in HPC Users and Administrators Tutorial 02/07/15 Yiannis Georgiou R&D Sofware Architect Bull, 2012 1 Introduction SLURM scalable and flexible RJMS Part 1: Basics

More information

Lesson 3 Transcript: Part 1 of 2 - Tools & Scripting

Lesson 3 Transcript: Part 1 of 2 - Tools & Scripting Lesson 3 Transcript: Part 1 of 2 - Tools & Scripting Slide 1: Cover Welcome to lesson 3 of the db2 on Campus lecture series. Today we're going to talk about tools and scripting, and this is part 1 of 2

More information

Troubleshooting Jobs on Odyssey

Troubleshooting Jobs on Odyssey Troubleshooting Jobs on Odyssey Paul Edmon, PhD ITC Research CompuGng Associate Bob Freeman, PhD Research & EducaGon Facilitator XSEDE Campus Champion Goals Tackle PEND, FAIL, and slow performance issues

More information

LAB. Preparing for Stampede: Programming Heterogeneous Many-Core Supercomputers

LAB. Preparing for Stampede: Programming Heterogeneous Many-Core Supercomputers LAB Preparing for Stampede: Programming Heterogeneous Many-Core Supercomputers Dan Stanzione, Lars Koesterke, Bill Barth, Kent Milfeld dan/lars/bbarth/milfeld@tacc.utexas.edu XSEDE 12 July 16, 2012 1 Discovery

More information

SGE 6.0 configuration guide, version 1.1

SGE 6.0 configuration guide, version 1.1 SGE 6.0 configuration guide, version 1.1 Juha Jäykkä juolja@utu.fi Department of Physics Laboratory of Theoretical Physics University of Turku 18.03.2005 First, some notes This needs to be revised to include

More information

Intel Manycore Testing Lab (MTL) - Linux Getting Started Guide

Intel Manycore Testing Lab (MTL) - Linux Getting Started Guide Intel Manycore Testing Lab (MTL) - Linux Getting Started Guide Introduction What are the intended uses of the MTL? The MTL is prioritized for supporting the Intel Academic Community for the testing, validation

More information

Ebook : Overview of application development. All code from the application series books listed at:

Ebook : Overview of application development. All code from the application series books listed at: Ebook : Overview of application development. All code from the application series books listed at: http://www.vkinfotek.com with permission. Publishers: VK Publishers Established: 2001 Type of books: Develop

More information

How to access Geyser and Caldera from Cheyenne. 19 December 2017 Consulting Services Group Brian Vanderwende

How to access Geyser and Caldera from Cheyenne. 19 December 2017 Consulting Services Group Brian Vanderwende How to access Geyser and Caldera from Cheyenne 19 December 2017 Consulting Services Group Brian Vanderwende Geyser nodes useful for large-scale data analysis and post-processing tasks 16 nodes with: 40

More information

OpenPBS Users Manual

OpenPBS Users Manual How to Write a PBS Batch Script OpenPBS Users Manual PBS scripts are rather simple. An MPI example for user your-user-name: Example: MPI Code PBS -N a_name_for_my_parallel_job PBS -l nodes=7,walltime=1:00:00

More information

Training day SLURM cluster. Context Infrastructure Environment Software usage Help section SLURM TP For further with SLURM Best practices Support TP

Training day SLURM cluster. Context Infrastructure Environment Software usage Help section SLURM TP For further with SLURM Best practices Support TP Training day SLURM cluster Context Infrastructure Environment Software usage Help section SLURM TP For further with SLURM Best practices Support TP Context PRE-REQUISITE : LINUX connect to «genologin»

More information

Flux: Practical Job Scheduling

Flux: Practical Job Scheduling Flux: Practical Job Scheduling August 15, 2018 Dong H. Ahn, Ned Bass, Al hu, Jim Garlick, Mark Grondona, Stephen Herbein, Tapasya Patki, Tom Scogland, Becky Springmeyer This work was performed under the

More information

Working with Shell Scripting. Daniel Balagué

Working with Shell Scripting. Daniel Balagué Working with Shell Scripting Daniel Balagué Editing Text Files We offer many text editors in the HPC cluster. Command-Line Interface (CLI) editors: vi / vim nano (very intuitive and easy to use if you

More information

BACKING UP LINUX AND OTHER UNIX(- LIKE) SYSTEMS

BACKING UP LINUX AND OTHER UNIX(- LIKE) SYSTEMS BACKING UP LINUX AND OTHER UNIX(- LIKE) SYSTEMS There are two kinds of people: those who do regular backups and those who never had a hard drive failure Unknown. 1. Introduction The topic of doing backups

More information

Day 9: Introduction to CHTC

Day 9: Introduction to CHTC Day 9: Introduction to CHTC Suggested reading: Condor 7.7 Manual: http://www.cs.wisc.edu/condor/manual/v7.7/ Chapter 1: Overview Chapter 2: Users Manual (at most, 2.1 2.7) 1 Turn In Homework 2 Homework

More information

Cluster Computing. Resource and Job Management for HPC 16/08/2010 SC-CAMP. ( SC-CAMP) Cluster Computing 16/08/ / 50

Cluster Computing. Resource and Job Management for HPC 16/08/2010 SC-CAMP. ( SC-CAMP) Cluster Computing 16/08/ / 50 Cluster Computing Resource and Job Management for HPC SC-CAMP 16/08/2010 ( SC-CAMP) Cluster Computing 16/08/2010 1 / 50 Summary 1 Introduction Cluster Computing 2 About Resource and Job Management Systems

More information

Batch Systems. Running your jobs on an HPC machine

Batch Systems. Running your jobs on an HPC machine Batch Systems Running your jobs on an HPC machine Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us

More information

Introduction to the Cluster

Introduction to the Cluster Follow us on Twitter for important news and updates: @ACCREVandy Introduction to the Cluster Advanced Computing Center for Research and Education http://www.accre.vanderbilt.edu The Cluster We will be

More information