Federated Cluster Support

Similar documents
Directions in Workload Management

Slurm Inter-Cluster Project. Stephen Trofinoff CSCS Via Trevano 131 CH-6900 Lugano 24-September-2014

Scheduling By Trackable Resources

Heterogeneous Job Support

Slurm Overview. Brian Christiansen, Marshall Garey, Isaac Hartung SchedMD SC17. Copyright 2017 SchedMD LLC

Slurm Birds of a Feather

1 Bull, 2011 Bull Extreme Computing

High Scalability Resource Management with SLURM Supercomputing 2008 November 2008

Slurm Version Overview

Introduction to Slurm

High Performance Computing Cluster Advanced course

Resource Management at LLNL SLURM Version 1.2

Introduction to SLURM on the High Performance Cluster at the Center for Computational Research

High Throughput Computing with SLURM. SLURM User Group Meeting October 9-10, 2012 Barcelona, Spain

MSLURM SLURM management of multiple environments

Slurm Workload Manager Introductory User Training

Introduction to RCC. September 14, 2016 Research Computing Center

Slurm Roadmap. Morris Jette, Danny Auble (SchedMD) Yiannis Georgiou (Bull)

Introduction to RCC. January 18, 2017 Research Computing Center

Slurm Roadmap. Danny Auble, Morris Jette, Tim Wickberg SchedMD. Slurm User Group Meeting Copyright 2017 SchedMD LLC

Batch Usage on JURECA Introduction to Slurm. May 2016 Chrysovalantis Paschoulas HPS JSC

How to access Geyser and Caldera from Cheyenne. 19 December 2017 Consulting Services Group Brian Vanderwende

SLURM Operation on Cray XT and XE

RHRK-Seminar. High Performance Computing with the Cluster Elwetritsch - II. Course instructor : Dr. Josef Schüle, RHRK

High Performance Computing Cluster Basic course

Slurm Burst Buffer Support

Using and Modifying the BSC Slurm Workload Simulator. Slurm User Group Meeting 2015 Stephen Trofinoff and Massimo Benini, CSCS September 16, 2015

Slurm basics. Summer Kickstart June slide 1 of 49

Introduction to High-Performance Computing (HPC)

Duke Compute Cluster Workshop. 3/28/2018 Tom Milledge rc.duke.edu

cli_filter command line filtration, manipulation, and introspection of job submissions

Slurm at the George Washington University Tim Wickberg - Slurm User Group Meeting 2015

SLURM Event Handling. Bill Brophy

Versions and 14.11

Parallel Merge Sort Using MPI

NERSC Site Report One year of Slurm Douglas Jacobsen NERSC. SLURM User Group 2016

CRUK cluster practical sessions (SLURM) Part I processes & scripts

CNAG Advanced User Training

To connect to the cluster, simply use a SSH or SFTP client to connect to:

Duke Compute Cluster Workshop. 11/10/2016 Tom Milledge h:ps://rc.duke.edu/

How to run a job on a Cluster?

COSC 6385 Computer Architecture. - Homework

ECE 574 Cluster Computing Lecture 4

Troubleshooting Jobs on Odyssey

Sherlock for IBIIS. William Law Stanford Research Computing

Slurm Workload Manager Overview SC15

Slurm. Ryan Cox Fulton Supercomputing Lab Brigham Young University (BYU)

Exercises: Abel/Colossus and SLURM

Good to Great: Choosing NetworkComputer over Slurm

Submitting and running jobs on PlaFRIM2 Redouane Bouchouirbat

Scheduler Optimization for Current Generation Cray Systems

Student HPC Hackathon 8/2018

Introduction to SLURM & SLURM batch scripts

TITLE: PRE-REQUISITE THEORY. 1. Introduction to Hadoop. 2. Cluster. Implement sort algorithm and run it using HADOOP

Duke Compute Cluster Workshop. 10/04/2018 Tom Milledge rc.duke.edu

Efficient Geographic Replication & Disaster Recovery. Tom Pantelis Brian Freeman Colin Dixon

Slurm at UPPMAX. How to submit jobs with our queueing system. Jessica Nettelblad sysadmin at UPPMAX

Introduction to Joker Cyber Infrastructure Architecture Team CIA.NMSU.EDU

Part One: The Files. C MPI Slurm Tutorial - Hello World. Introduction. Hello World! hello.tar. The files, summary. Output Files, summary

CEA Site Report. SLURM User Group Meeting 2012 Matthieu Hautreux 26 septembre 2012 CEA 10 AVRIL 2012 PAGE 1

Bright Cluster Manager: Using the NVIDIA NGC Deep Learning Containers

Introduction to GACRC Teaching Cluster

Distributed Systems. 05r. Case study: Google Cluster Architecture. Paul Krzyzanowski. Rutgers University. Fall 2016

SLURM: Resource Management and Job Scheduling Software. Advanced Computing Center for Research and Education

Slurm and Abel job scripts. Katerina Michalickova The Research Computing Services Group SUF/USIT October 23, 2012

Moab Passthrough. Administrator Guide February 2018

Case study of a computing center: Accounts, Priorities and Quotas

Introduction to the Cluster

LAB. Preparing for Stampede: Programming Heterogeneous Many-Core Supercomputers

Juropa3 Experimental Partition

June Workshop Series June 27th: All About SLURM University of Nebraska Lincoln Holland Computing Center. Carrie Brown, Adam Caprez

Introduction to the Cluster

From Moab to Slurm: 12 HPC Systems in 2 Months. Peltz, Fullop, Jennings, Senator, Grunau

HPC Introductory Course - Exercises

Using Compute Canada. Masao Fujinaga Information Services and Technology University of Alberta

HDFS Federation. Sanjay Radia Founder and Hortonworks. Page 1

Submitting batch jobs

Introduction to High Performance Computing at Case Western Reserve University. KSL Data Center

Submitting batch jobs Slurm on ecgate Solutions to the practicals

Slurm and Abel job scripts. Katerina Michalickova The Research Computing Services Group SUF/USIT November 13, 2013

Resource Management using SLURM

Distributed Computations MapReduce. adapted from Jeff Dean s slides

Spark.jl: Resilient Distributed Datasets in Julia

Scientific Computing in practice

Systems Infrastructure for Data Science. Web Science Group Uni Freiburg WS 2013/14

Using the SLURM Job Scheduler

Batch Systems & Parallel Application Launchers Running your jobs on an HPC machine

1

Systems Infrastructure for Data Science. Web Science Group Uni Freiburg WS 2012/13

SCALABLE HYBRID PROTOTYPE

Introduction to GACRC Teaching Cluster

Introduction to the NCAR HPC Systems. 25 May 2018 Consulting Services Group Brian Vanderwende

MATRIX:DJLSYS EXPLORING RESOURCE ALLOCATION TECHNIQUES FOR DISTRIBUTED JOB LAUNCH UNDER HIGH SYSTEM UTILIZATION

INTRODUCTION TO GPU COMPUTING WITH CUDA. Topi Siro

Introduction to UBELIX

Introduction to GACRC Teaching Cluster PHYS8602

IBM InfoSphere Streams v4.0 Performance Best Practices

P a g e 1. HPC Example for C with OpenMPI

BUILDING A SCALABLE MOBILE GAME BACKEND IN ELIXIR. Petri Kero CTO / Ministry of Games

Training day SLURM cluster. Context Infrastructure Environment Software usage Help section SLURM TP For further with SLURM Best practices Support TP

Transcription:

Federated Cluster Support Brian Christiansen and Morris Jette SchedMD LLC Slurm User Group Meeting 2015

Background Slurm has long had limited support for federated clusters Most commands support a --cluster (-M) option to route requests to different clusters Submitted jobs are routed to one cluster Each cluster operates independently Job IDs on each cluster are independent (two jobs can have same ID) No cross-cluster job dependencies No job migration between clusters No unified view of system state, each cluster largely independent

New Capabilities Job Migration Pending jobs automatically migrated to less busy clusters Fault Tolerance Participating clusters will take over work of a failed cluster Cross-cluster Job Dependencies Unified Views Easy Administration Add/remove clusters to/from the federation with simple configuration change, no extra information required in database

Related Work Some work has been done on addressing these shortcomings in functionality, but was lacking in scalability and was never integrated into the official release The major problem was the use of a single daemon to manage the mapping of job ID to cluster The slurmdbd maintained a table identifying which job IDs were on each clusters Slurmdbd used for job dependency testing Placed very heavy load on slurmdbd to locate jobs

Design Goals Performance: Little to no reduction in throughput of each cluster, performance scales with cluster count Scalability: No reduction in scalability of individual clusters, able to support many federated clusters Fault tolerant: No single point of failure Ease of use: Unified enterprise-wide view, minimize change in user interface Stability: No change in behavior for clusters not explicitly placed into a federation

Eliminating the Bottleneck

Eliminating the Bottleneck Need mechanism to identify the cluster associated with a job ID without using slurmdbd lookup Make use of 32-bit job ID Embed cluster ID within the job ID Bit 31: Flag for federated cluster job ID Bits 23-30: Cluster ID (0 to 255) Bits 0-22: Job ID (0 to 8,388,607) Unique job ID across all clusters Large but unique: 2164339463 (ClusterBit + ClusterID:2 + JobID:78,599)

Job Submission sbatch, salloc, srun supported Get available clusters (IP address + port) from local slurmctld Local slurmctld keeps a cache from the slurmdbd slurmdbd is the backup for cluster information Submit a master job to a randomly selected cluster Perform light weight check to verify job can run on cluster phantom jobs are submitted to a number of clusters All jobs contain job ID and the locations of all phantom jobs

Job Submission MAGIC: TBD

Job Submission

Cross-Cluster Job Dependencies Dependencies created with standard --dependency= syntax Master controller will check status of remote jobs on other clusters

Fault Tolerance / Job Migration Controllers coordinate taking over the job if: master controller is down for a period of time Job can be started sooner The RPCs for jobs started on another cluster will be re-routed Lots of moving parts to consider and to prevent split-brain Don t have two jobs running at the same time

Fault Tolerance / Job Migration

Configuration Parameter identifying whether cluster is part of a federation Federation=yes no Parameter to control how many phantom jobs to spawn PhantomJobs=# Possible to move cluster into or out of federation without Slurm restart (scontrol command)

Unified Views Provide unified view of federated clusters by default Subset of federated clusters can be requested using -M option squeue, sinfo, sprio will output cluster name in separate column squeue can sort by cluster name CLUSTER JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) ClusterA 2164339463 debug wrap brian PD 0:00 10 (Resources) ClusterA 2164329686 debug wrap brian PD 0:00 10 (Priority) ClusterB 2197893831 power wrap brian PD 0:00 10 (Resources) ClusterA 2164333731 long wrap brian PD 0:00 10 (Resources) ClusterC 2265002695 debug wrap brian PD 0:00 10 (Priority) ClusterC 2265002695 debug wrap brian R 0:36 1 c11 ClusterA 2164333936 debug wrap brian R 0:36 10 a[01-10] ClusterB 2197893831 power wrap brian R 0:36 1 b180 ClusterC 2265002695 debug wrap brian R 0:36 102 c[1-10,15,51-100,120-140,143-160] ClusterA 2164333932 short wrap brian R 0:39 1 a52 ClusterA 12345 short wrap brian R 0:39 1 d79 ClusterA 2164333933 short wrap brian R 0:39 1 a113

Schedule Development to begin 4Q 2015 Included in next major release, version 16.05

Questions?