Slurm Version Overview

Similar documents
Slurm Overview. Brian Christiansen, Marshall Garey, Isaac Hartung SchedMD SC17. Copyright 2017 SchedMD LLC

Introduction to Slurm

Slurm Birds of a Feather

Heterogeneous Job Support

Versions and 14.11

Scheduling By Trackable Resources

Federated Cluster Support

NERSC Site Report One year of Slurm Douglas Jacobsen NERSC. SLURM User Group 2016

1 Bull, 2011 Bull Extreme Computing

Slurm Roadmap. Danny Auble, Morris Jette, Tim Wickberg SchedMD. Slurm User Group Meeting Copyright 2017 SchedMD LLC

Directions in Workload Management

Batch Usage on JURECA Introduction to Slurm. May 2016 Chrysovalantis Paschoulas HPS JSC

SLURM Operation on Cray XT and XE

Resource Management at LLNL SLURM Version 1.2

Slurm Workload Manager Overview SC15

Scheduler Optimization for Current Generation Cray Systems

Hosts & Partitions. Slurm Training 15. Jordi Blasco & Alfred Gil (HPCNow!)

Slurm Workload Manager Introductory User Training

Slurm Burst Buffer Support

A declarative programming style job submission filter.

Introduction to High-Performance Computing (HPC)

High Performance Computing Cluster Advanced course

Slurm basics. Summer Kickstart June slide 1 of 49

High Scalability Resource Management with SLURM Supercomputing 2008 November 2008

cli_filter command line filtration, manipulation, and introspection of job submissions

RHRK-Seminar. High Performance Computing with the Cluster Elwetritsch - II. Course instructor : Dr. Josef Schüle, RHRK

MSLURM SLURM management of multiple environments

SLURM Event Handling. Bill Brophy

Brigham Young University

Submitting batch jobs

Torque Resource Manager Release Notes

Slurm Roadmap. Morris Jette, Danny Auble (SchedMD) Yiannis Georgiou (Bull)

Introduction to High-Performance Computing (HPC)

Slurm. Ryan Cox Fulton Supercomputing Lab Brigham Young University (BYU)

LRZ Site Report. Athens, Juan Pancorbo Armada

Slurm Roll for Rocks Cluster. Werner Saar

Sherlock for IBIIS. William Law Stanford Research Computing

STARTING THE DDT DEBUGGER ON MIO, AUN, & MC2. (Mouse over to the left to see thumbnails of all of the slides)

High Performance Computing Cluster Basic course

Introduction to RCC. September 14, 2016 Research Computing Center

Introduction to RCC. January 18, 2017 Research Computing Center

Duke Compute Cluster Workshop. 3/28/2018 Tom Milledge rc.duke.edu

Linux Clusters Institute: Scheduling

Porting SLURM to the Cray XT and XE. Neil Stringfellow and Gerrit Renker

High Throughput Computing with SLURM. SLURM User Group Meeting October 9-10, 2012 Barcelona, Spain

Slurm Inter-Cluster Project. Stephen Trofinoff CSCS Via Trevano 131 CH-6900 Lugano 24-September-2014

HTCondor on Titan. Wisconsin IceCube Particle Astrophysics Center. Vladimir Brik. HTCondor Week May 2018

Introduction to SLURM on the High Performance Cluster at the Center for Computational Research

Choosing Resources Wisely Plamen Krastev Office: 38 Oxford, Room 117 FAS Research Computing

Submitting and running jobs on PlaFRIM2 Redouane Bouchouirbat

Graham vs legacy systems

Slurm at the George Washington University Tim Wickberg - Slurm User Group Meeting 2015

Exascale Process Management Interface

How to access Geyser and Caldera from Cheyenne. 19 December 2017 Consulting Services Group Brian Vanderwende

MIC Lab Parallel Computing on Stampede

Managing and Deploying GPU Accelerators. ADAC17 - Resource Management Stephane Thiell and Kilian Cavalotti Stanford Research Computing Center

Case study of a computing center: Accounts, Priorities and Quotas

Exercises: Abel/Colossus and SLURM

Choosing Resources Wisely. What is Research Computing?

Duke Compute Cluster Workshop. 10/04/2018 Tom Milledge rc.duke.edu

Introduction to the NCAR HPC Systems. 25 May 2018 Consulting Services Group Brian Vanderwende

Bright Cluster Manager

Moab Workload Manager on Cray XT3

From Moab to Slurm: 12 HPC Systems in 2 Months. Peltz, Fullop, Jennings, Senator, Grunau

Requirement and Dependencies

Slurm BOF SC13 Bull s Slurm roadmap

Duke Compute Cluster Workshop. 11/10/2016 Tom Milledge h:ps://rc.duke.edu/

Workload Managers. A Flexible Approach

ECE 574 Cluster Computing Lecture 4

Extending SLURM with Support for GPU Ranges

Introduction to Joker Cyber Infrastructure Architecture Team CIA.NMSU.EDU

Linux Clusters Institute: Scheduling and Resource Management. Brian Haymore, Sr IT Arch - HPC, University of Utah May 2017

Batch Systems & Parallel Application Launchers Running your jobs on an HPC machine

How to run a job on a Cluster?

Training day SLURM cluster. Context Infrastructure Environment Software usage Help section SLURM TP For further with SLURM Best practices Support TP

SLURM: Resource Management and Job Scheduling Software. Advanced Computing Center for Research and Education

Training day SLURM cluster. Context. Context renewal strategy

HPC Introductory Course - Exercises

An introduction to checkpointing. for scientific applications

Cluster Computing in Frankfurt

Introduction to UBELIX

Fixed Bugs for IBM Platform LSF Version

Tutorial 3: Cgroups Support On SLURM

Compiling applications for the Cray XC

Using and Modifying the BSC Slurm Workload Simulator. Slurm User Group Meeting 2015 Stephen Trofinoff and Massimo Benini, CSCS September 16, 2015

Juropa3 Experimental Partition

SLURM Simulator improvements and evaluation

Running Jobs on Blue Waters. Greg Bauer

Introduction to SLURM & SLURM batch scripts

COSC 6374 Parallel Computation. Debugging MPI applications. Edgar Gabriel. Spring 2008

CNAG Advanced User Training

SCALABLE HYBRID PROTOTYPE

Enhancing an Open Source Resource Manager with Multi-Core/Multi-threaded Support

Introduction to Abel/Colossus and the queuing system

Using the SLURM Job Scheduler

Slurm Configuration Impact on Benchmarking

LSF HPC :: getting most out of your NUMA machine

HTC Brief Instructions

Introduction to the Cluster

Slurm and Abel job scripts. Katerina Michalickova The Research Computing Services Group SUF/USIT November 13, 2013

Transcription:

Slurm Version 18.08 Overview Brian Christiansen SchedMD Slurm User Group Meeting 2018

Schedule Previous major release was 17.11 (November 2017) Latest major release 18.08 (August 2018) Next major release planned 19.05 (May 2019)

18.08 Highlights Heterogeneous environments Burst buffer enhancements Fault tolerance Cloud computing Queue stuffing New TRes New TRes reporting options And more...

Heterogeneous Job Steps (MPI) A single MPI_COMM_WORLD can span multiple heterogeneous job components Can vary CPUs, memory, GPUs, etc. for job component Each component of the job will internally have a separate step record MPI will be presented with a view of one single job, spanning all Slurm job components, using environment variables Not compatible with Cray MPI or ALPS libraries sacct can report all components with --whole-hetjob=yes option

Heterogeneous Features Specifications Applies to both advanced reservations and job requests Adds support for parenthesis and counts Favors use of nodes with features currently available Avoid KNL reboot for requested MCDRAM or NUMA mode $ sbatch --constraint=[(knl&snc4&flat)*4&haswell*1]...

Heterogeneous Features (continued) Job can specify required features of batch host $ sbatch --constraint=[(knl&snc4&flat)*4&haswell*1] --batch=haswell...

KNL Enhancements Added node and partition CpuBind configuration parameter to control default task binding (in slurm.conf) Added NumaCpuBind configuration parameter to knl.conf Changes node s default CPU bind mode based upon KNL NUMA mode Added ValidateMode to knl.conf for static KNL configurations Added NodeRebootWeight configuration parameter to knl.conf Report node feature plugin configuration with scontrol show config

Burst Buffers Added scontrol show dwstat command to report burst buffer status Added GetSysStatus field to burst_buffer.conf file Added new job state of SO/STAGE_OUT while stage-out in progress Added new burst buffer state of teardown-fail Burst buffer errors logged to new job SystemComment field Enable jobs with zero node count for creation or deletion of persistent burst buffers Add SetExecHost flag to burst_buffer.conf to enable access from login node for interactive jobs

Arbitrary Count of slurmctld backups Configuration parameters ControlMachine, BackupController, etc replaced with ordered list of hostnames and IP addresses Old options are still supported Added SlurmctldPrimaryOnProg and SlurmctldPrimaryOffProg configuration options to run scripts when primary/backup changes Scripts can change IP routing if desired # Excerpt of slurm.conf file SlurmctldHost=head1(12.34.56.78) SlurmctldHost=head2(12.34.56.79) SlurmctldHost=head3(12.34.56.80) SlurmctldAddr=10.120.10.1 SlurmctldPrimaryOnProg=/opt/slurm/default/etc/primary_on SlurmctldPrimaryOffProg=/opt/slurm/default/etc/primary_off

Cloud Enhancements ResumeFailProgram The program that will be executed when nodes fail to resume by ResumeTimeout. The argument to the program will be the names of the failed nodes (using Slurm's hostlist expression format).

Queue Stuffing New association/qos options GrpJobsAccrue=<max jobs> Maximum number of pending jobs in aggregate able to accrue age priority for this association and all associations which are children of this association. MaxJobsAccrue=<max jobs> Maximum number of pending jobs able to accrue age priority at any given time for the given association. This is overridden if set directly on a user. Default is the cluster's limit.

New TRes New TRES fs/disk, fs/lustre, ic/ofed, vmem, pages New default TRES cpu, mem, energy, node, billing, fs/disk, vmem, pages AcctGather{FileSystem Infiniband}Type Not just for profiling anymore. Must define fs/lustre and/or ic/ofed in AccountingStorageTRES

New TRes reporting options TresUsage{In Out}{Ave Min Max Tot} NOTE: When using with Ave[RSS VM]Size or their values in TRESUsageIn[Ave Tot], they represent the average/total of the highest watermarks over all ranks in the step. When using sstat they represent the average/total at the moment the command was ran. NOTE: TRESUsage*Min* values represent the lowest high water mark in the step. TresUsage{In Out}{MinNode MinTask MaxNode MaxTask} Node/task that reached min/max usage

Other, for users Disable local PTY output processing when using srun --unbuffered. Prevents \r insertion to output string Avoid terminating other processes in a task group when any task is core dumping to avoid incomplete OpenMP core files srun command returns the highest signal of any task Append, with requeued tasks to end of job array end email when any task is requeued. This is a hint to use sacct --duplicates to see all job accounting information

Other, for users (continued) Add salloc/sbatch/srun option of --gres-flags=disable-binding to disable filtering of CPUs with respect to generic resource locality This option is currently required to use more CPUs than are bound to a GRES (i.e. if a GPU is bound to the CPUs on one socket, but resources on more than one socket are required to run the job) Add name of partition used to output of srun --test-only Valuable for jobs submitted to multiple partitions Add job reason ReqNodeNotAvail, reserved for maintenance

Other, for administrators scontrol reboot enhancements Add ability to specify node reason Add ability to specify node state after reboot completion Consider booting as available for backfill future scheduling and do not replace in advanced reservations sdiag command enhancements Report outgoing message queue contents Report pending job count

Other, for administrators (continued) Explicitly shutdown the slurmd daemon when reboot requested SlurmdParameters=shutdown_on_reboot Add scontrol ability to create/update TRESBillingWeights Calculate TRES billing values at job submission to enforce QOS DenyOnLimit configuration Cray: Add CheckGhalQuiesce to CommunicationParameters in slurm.conf

Other, for administrators (continued) The default AuthType for slurmdbd is now "auth/munge" User defined triggers are now disabled by default Added SlurmctldParameters option allow_user_triggers to enable user-defined triggers. SchedulerParameters "whole_pack" option has been renamed to "whole_hetjob" (old option still supported)

Other, for administrators (continued) ConstrainKmemSpace is now disabled by default due to Linux kernel resource leaks cgroup_allowed_devices_file.conf no longer required Add acct_gather_profile/influxdb plugin to store job profiling information to InfluxDB rather than HDF5 Added MinPrioThreshold on QOS Overrides bf_min_prio_reserve slurm.conf parameter

19.05 Completion of cons_tres plugin Slurm + GCP enhancements

Questions?