MSLURM SLURM management of multiple environments

Similar documents
Federated Cluster Support

Slurm Overview. Brian Christiansen, Marshall Garey, Isaac Hartung SchedMD SC17. Copyright 2017 SchedMD LLC

Introduction to Slurm

Slurm Version Overview

SLURM. User's Guide. ITS Research Computing Northeastern University Nilay K Roy, PhD

SLURM Event Handling. Bill Brophy

High Scalability Resource Management with SLURM Supercomputing 2008 November 2008

Slurm Inter-Cluster Project. Stephen Trofinoff CSCS Via Trevano 131 CH-6900 Lugano 24-September-2014

Slurm Birds of a Feather

1 Bull, 2011 Bull Extreme Computing

Slurm Roadmap. Danny Auble, Morris Jette, Tim Wickberg SchedMD. Slurm User Group Meeting Copyright 2017 SchedMD LLC

SLURM Operation on Cray XT and XE

Resource Management at LLNL SLURM Version 1.2

Slurm Burst Buffer Support

Requirement and Dependencies

NERSC Site Report One year of Slurm Douglas Jacobsen NERSC. SLURM User Group 2016

Versions and 14.11

User Management in Resource Manager

Slurm. Ryan Cox Fulton Supercomputing Lab Brigham Young University (BYU)

Using and Modifying the BSC Slurm Workload Simulator. Slurm User Group Meeting 2015 Stephen Trofinoff and Massimo Benini, CSCS September 16, 2015

LRZ Site Report. Athens, Juan Pancorbo Armada

Directions in Workload Management

Case study of a computing center: Accounts, Priorities and Quotas

PASS4TEST. IT Certification Guaranteed, The Easy Way! We offer free update service for one year

Resource Management using SLURM

Scheduling By Trackable Resources

Batch Usage on JURECA Introduction to Slurm. May 2016 Chrysovalantis Paschoulas HPS JSC

SLURM Simulator improvements and evaluation

SLURM Administrators Tutorial

Slurm Roll for Rocks Cluster. Werner Saar

High Throughput Computing with SLURM. SLURM User Group Meeting October 9-10, 2012 Barcelona, Spain

X-ROAD 6 SIGNER CONSOLE USER GUIDE

Heterogeneous Job Support

Slurm Workload Manager Introductory User Training

Good to Great: Choosing NetworkComputer over Slurm

MySQL 5.x on QNX Neutrino OS Step-by-step installation manual

Administration Dashboard Installation Guide SQream Technologies

Linux Clusters Institute: Scheduling and Resource Management. Brian Haymore, Sr IT Arch - HPC, University of Utah May 2017

Guide for students. Login

MySQL Cluster Ed 2. Duration: 4 Days

High Performance Computing Cluster Advanced course

Certification. System Initialization and Services

MMM. MySQL Master-Master Replication Manager. Pascal Hofmann

GroupWise 18 Utilities Reference. November 2017

Getting Started with MySQL

Parallel Merge Sort Using MPI

How To Start Mysql Using Linux Command Line Client In Ubuntu

SLURM Workload and Resource Management in HPC

UCI Command Line Interface Reference

Configuring LDAP. Finding Feature Information

Effective Testing for Live Applications. March, 29, 2018 Sveta Smirnova

2 Initialize a git repository on your machine, add a README file, commit and push

Tutorial 3: Cgroups Support On SLURM

Slurm Configuration Impact on Benchmarking

Real-time monitoring Slurm jobs with InfluxDB September Carlos Fenoy García

mysql Certified MySQL 5.0 DBA Part I

Slurm Roadmap. Morris Jette, Danny Auble (SchedMD) Yiannis Georgiou (Bull)

Slurm basics. Summer Kickstart June slide 1 of 49

Swiftrun i. Swiftrun

INTRODUCTION TO NEXTFLOW

CEA Site Report. SLURM User Group Meeting 2012 Matthieu Hautreux 26 septembre 2012 CEA 10 AVRIL 2012 PAGE 1

Lustre usages and experiences

HACMP High Availability Introduction Presentation February 2007

TRAINING AGENDA. Session 1: Installation/Implementation/Setup. Conversion: Existing Specify 5 users New users conversion, wizard, WorkBench

CS314 Software Engineering Configuration Management

SIOS Protection Suite for Linux MySQL Recovery Kit v Administration Guide

Part 1 : Getting Familiar with Linux. Hours. Part II : Administering Red Hat Enterprise Linux

A declarative programming style job submission filter.

Part One: The Files. C MPI Slurm Tutorial - Hello World. Introduction. Hello World! hello.tar. The files, summary. Output Files, summary

VERSION CONTROL GRAPHICAL INTERFACE FOR OPEN ONDEMAND HUAN CHEN. Submitted in partial fulfillment of the requirements for the degree of

Advanced Operating Systems

Download and install MySQL server 8 in Windows. Step1: Download windows installer

Submitting batch jobs

Introduction CHAPTER. Review Questions

Software Testing and Maintenance 1. Introduction Product & Version Space Interplay of Product and Version Space Intensional Versioning Conclusion

Sql Server 2005 Reference Manually Startup Parameters Cluster

Replication. Version

Visara Master Console Center. Software Installation P/N

MSC.ROBUST DESIGN INSTALLATION GUIDE

cli_filter command line filtration, manipulation, and introspection of job submissions

Workload Managers. A Flexible Approach

NETCONF Client GUI. Client Application Files APPENDIX

Splunk N Box. Splunk Multi-Site Clusters In 20 Minutes or Less! Mohamad Hassan Sales Engineer. 9/25/2017 Washington, DC

Resource and Service Trading in a Heterogeneous Large Distributed

7: 119 8: 151 9: : : : : : : : A

Enhancing an Open Source Resource Manager with Multi-Core/Multi-threaded Support

Introduction to Joker Cyber Infrastructure Architecture Team CIA.NMSU.EDU

Managing CPS vdra Cluster

Grid Engine Users Guide. 5.5 Edition

Pearson System of Courses

User Migration Tool. User Migration Tool Prerequisites

Manual Shell Script Linux If File Exists And

Configuring Web-Based Authentication

How to run a job on a Cluster?

From Moab to Slurm: 12 HPC Systems in 2 Months. Peltz, Fullop, Jennings, Senator, Grunau

CuteFlow-V4 Documentation

Overview of a virtual cluster

Introduction to RCC. September 14, 2016 Research Computing Center

Oracle NoSQL Database

Pl Sql Copy Table From One Schema To Another

Transcription:

MSLURM SLURM management of multiple environments 1. Introduction This document is intended as an overview to mslurm for administrators Within an ordinary SLURM installation a single SLURM cluster is served by a SLURM master ("SLURM Control Node"). Consequently, certain daemons are running on a SLURM master computer: SLURM control daemon ("slurmctld"), Upon recommended use of a MySQL database also a SLURM data base daemon ("slurmdbd") and a MySQL server daemon ("mysqld") can run on SLURM master. In case several SLURM clusters or several SLURM data bases are managed by a single SLURM master, an appropriate superstructure is necessary. 2. Idea Such a superstructure for the management of multiple SLURM environments is done with MSLURM. Thereby several SLURM clusters - even across multiple SLURM databases - can run parallel on a SLURM master and can be administered in an easy and elegantly manner. MSLURM scripts are only used on the single merged SLURM master. The configuration of the compute or login nodes doesn't have to be changed. MSLURM exploits the fact that a single SLURM environment can be configured in such a way that no conflicts arise with different SLURM environments. o Defining different ports in the configuration files ("slurm.conf","slurmdbd.conf") o Creating new home directories for each cluster configuration. 3. Implementation The script "mslurm" ("/usr/sbin/mslurm" or even "/usr/sbin/mslurmdbd"), enables both the individual and multiple SLURM daemons management and the execution of slurm commands. To achieve this MSLURM uses the environment variable SLURM_CONF to specify the location of each slurm.conf file to each cluster. The mslurm.conf ("/etc/slurm/mslurm.conf") file contains set up information for all the different clusters and databases, which will run on the MSLURM master. Furthermore, suitably modified SLURM startup scripts ("/etc/init.d/slurm, /etc/init.d/slurmdbd") belong to MSLURM, as long as or because SLURM supports the variable SLURM_CONF as described in the SLURM documentation with its own startup scripts.

These SLURM startup scripts are exclusively used by the MSLURM startup scripts ("/etc/init.d/mslurm /etc/init.d/mslurmdbd" and shorter "/usr/sbin/rcmslurm /etc/init.d/rcslurmdbd"), which manage the one MSLURM environment as a whole, i.e. all defined objects. 4. MySQL MySQL supports the use of independent parallel running MySQL databases ("mysqld_multi"). The MySQL databases used one-on-one by the SLURM databases, which are managed by an ordinary multi-managed MySQL installation ("/etc/mysql/mysql.conf"). 5. Internal Dependencies When managing a MSLURM environment consider the existing internal logical dependencies when booting [slurmd <-] *) slurmctld <- slurmdbd <- mysqld_multi as given by the existing installed boot scripts ("/etc/init.d/{mysqld_multi,slurbdbd,slurm}". *) Also a MSLURM master can function as normal SLURM node. 6. Multiple Configurations Each of the SLURM environments of a MSLURM environment are configured just like ordinary SLURM environments except for different ports and directories. MLSURM supports simple strings for the distinction by name between SLURM environments. To distinguish the configuration files ("slurm.conf, slurmdbd. conf") of every SLURM environment, the files associated with the different configurations have to be placed in different directories (which are cluster name dependent). 7. Configuration Example An individual MSLURM directory structure for the SLURM clusters "foo" and "bar", that share the database "baz", and the SLURM clusters "cluster1b", "cluster2b" and "cluster3b" the database "unionb" could then be obtained with the following SLURM configuration files: /etc/slurm/db=baz/slurmdbd.conf: LogFile=/var/log/slurmdbd-baz.log PidFile=/var/run/slurmdbd-baz.pid DbdPort=9119

/etc/slurm/cluster=foo/slurm.conf: StateSaveLocation=/var/slurm-foo SlurmdSpoolDir=/var/slurmd-foo SlurmctldPidFile=/var/run/slurmctld-foo.pid SlurmdPidFile=/var/run/slurmd-foo.pid SlurmctldLogFile=/var/log/slurmctld-foo.log SlurmdLogFile=/var/log/slurmd-foo.log SlurmctldPort=9117 SlurmdPort=9118 AccountingStoragePort=9119 /etc/slurm/cluster=bar/slurm.conf StateSaveLocation=/var/slurm-bar SlurmdSpoolDir=/var/slurmd-bar SlurmctldPidFile=/var/run/slurmctld-bar.pid SlurmdPidFile=/var/run/slurmd-bar.pid SlurmctldLogFile=/var/log/slurmctld-bar.log SlurmdLogFile=/var/log/slurmd-bar.log SlurmctldPort=9127 SlurmdPort=9128 AccountingStoragePort=9119 /etc/slurm/db=unionb/slurmdbd.conf LogFile=/var/log/slurmdbd-unionb.log PidFile=/var/run/slurmdbd-unionb.pid DbdPort=9219 /etc/slurm/cluster=cluster1b/slurm.conf

StateSaveLocation=/var/slurm-cluster1b SlurmdSpoolDir=/var/slurmd-cluster1b SlurmctldPidFile=/var/run/slurmctld-cluster1b.pid SlurmdPidFile=/var/run/slurmd-cluster1b.pid SlurmctldLogFile=/var/log/slurmctld-cluster2b.log SlurmdLogFile=/var/log/slurmd-cluster2b.log SlurmctldPort=9217 SlurmdPort=9218 AccountingStoragePort=9219 /etc/slurm/cluster=cluster2b/slurm.conf StateSaveLocation=/var/slurm-cluster2b SlurmdSpoolDir=/var/slurmd-cluster2b SlurmctldPidFile=/var/run/slurmctld-cluster2b.pid SlurmdPidFile=/var/run/slurmd-cluster2b.pid SlurmctldLogFile=/var/log/slurmctld-cluster2b.log SlurmdLogFile=/var/log/slurmd-cluster2b.log SlurmctldPort=9227 SlurmdPort=9228 AccountingStoragePort=9219 /etc/slurm/cluster=cluster3b/slurm.conf StateSaveLocation=/var/slurm-cluster3b SlurmdSpoolDir=/var/slurmd-cluster3b SlurmctldPidFile=/var/run/slurmctld-cluster3b.pid SlurmdPidFile=/var/run/slurmd-cluster3b.pid SlurmctldLogFile=/var/log/slurmctld-cluster3b.log SlurmdLogFile=/var/log/slurmd-cluster3b.log SlurmctldPort=9237

SlurmdPort=9238 AccountingStoragePort=9219 The contents of the respective MSLURM-configuration ("mslurm.conf") allows for all MSLURM possible combinations as illustrated below: MSLURM_SLURM_CONF="/etc/slurm/cluster=%cn/slurm.conf" MSLURM_SLURMDBD_CONF="/etc/slurm/db=%un/slurmdbd.conf" MSLURM_SLURM_CLUSTERNAMES_baz="foo bar" MSLURM_SLURM_CLUSTERNAMES_unionb="cluster1b cluster2b cluster3b" MSLURM_SLURM_CLUSTERNAMES="$MSLURM_SLURM_CLUSTERNAMES_baz $MSLURM_SLURM_CLUSTERNAMES_unionb" MSLURM_SLURMDBD_UNIONNAMES="baz unionb" Here the fixed assigned wildcards %cn and %un stand for a SLURM clusters or SLURM database names. A SLURM cluster name must be identical to the contents of the variable cluster name of the respective SLURM configuration file. 8. Compute nodes Also, you should verify that the participant SLURM nodes in one SLURM cluster (for example foo) receive the content of the corresponding SLURM configuration (here "/etc/slurm/cluster=foo/slurm.conf") in the standard path ("/etc/slurm/slurm.conf"). As there is also a "standard" SLURM configuration on the compute nodes, SLURM is also managed with the ordinary SLURM tools ("/etc/init.d/slurm, sinfo, scontrol."). 9. Examples Daemons belonging to a single SLURM cluster or SLURM data base are managed at the MSLURM master via or mslurm <SLURM cluster> {start status stop } mslurmdbd <SLURM data base> {start status stop } Any SLURM commands to the SLURM cluster are executed with mslurm <SLURM cluster> <SLURM-Command> {} <arguments>

All SLURM clusters of a SLURM data base can be addressed by the name of SLURM data base: mslurm <SLURM data base> <SLURM-Command> {} <arguments> All the objects can be combined with the option "-a" instead of specifying a SLURM cluster or SLURM data bases: mslutm -a status mslurmdbd -a status mslurm -a sinfo (The "-a" option is used by the MSLURM startup scripts when booting) Also CVS lists of objects and actions can be executed: mslurm foo,unionb scontrol show config mslurm foo status,sinfo