Slurm at the George Washington University Tim Wickberg - Slurm User Group Meeting 2015

Similar documents
Sherlock for IBIIS. William Law Stanford Research Computing

CNAG Advanced User Training

Introduction to Slurm

Slurm Birds of a Feather

Introduction to High-Performance Computing (HPC)

The RAMDISK Storage Accelerator

Introduction to High-Performance Computing (HPC)

RHRK-Seminar. High Performance Computing with the Cluster Elwetritsch - II. Course instructor : Dr. Josef Schüle, RHRK

Slurm Overview. Brian Christiansen, Marshall Garey, Isaac Hartung SchedMD SC17. Copyright 2017 SchedMD LLC

Brigham Young University Fulton Supercomputing Lab. Ryan Cox

Slurm Roadmap. Danny Auble, Morris Jette, Tim Wickberg SchedMD. Slurm User Group Meeting Copyright 2017 SchedMD LLC

1 Bull, 2011 Bull Extreme Computing

Federated Cluster Support

Triton file systems - an introduction. slide 1 of 28

How to Use a Supercomputer - A Boot Camp

Comet Virtualization Code & Design Sprint

Introduction to Joker Cyber Infrastructure Architecture Team CIA.NMSU.EDU

Introduction to the Cluster

Introduction to High Performance Computing and an Statistical Genetics Application on the Janus Supercomputer. Purpose

Directions in Workload Management

Introduction to the Cluster

CYFRONET SITE REPORT IMPROVING SLURM USABILITY AND MONITORING. M. Pawlik, J. Budzowski, L. Flis, P. Lasoń, M. Magryś

Heterogeneous Job Support

Case study of a computing center: Accounts, Priorities and Quotas

Introduction to SLURM & SLURM batch scripts

Slurm basics. Summer Kickstart June slide 1 of 49

Brigham Young University

Deep Learning - Recitation (1/19/2018)

Graham vs legacy systems

Mission-Critical Lustre at Santos. Adam Fox, Lustre User Group 2016

Feedback on BeeGFS. A Parallel File System for High Performance Computing

Dell Fluid Data solutions. Powerful self-optimized enterprise storage. Dell Compellent Storage Center: Designed for business results

Challenges in making Lustre systems reliable

Submitting and running jobs on PlaFRIM2 Redouane Bouchouirbat

Clustering. Research and Teaching Unit

CIT 668: System Architecture. Amazon Web Services

DDN s Vision for the Future of Lustre LUG2015 Robert Triendl

High Performance Computing Cluster Advanced course

Exercises: Abel/Colossus and SLURM

The Cirrus Research Computing Cloud

Slurm Workload Manager Overview SC15

Introduction. Kevin Miles. Paul Henderson. Rick Stillings. Essex Scales. Director of Research Support. Systems Engineer.

Effective Use of CSAIL Storage

Managing and Deploying GPU Accelerators. ADAC17 - Resource Management Stephane Thiell and Kilian Cavalotti Stanford Research Computing Center

Hands-On Workshop bwunicluster June 29th 2015

Flat Datacenter Storage. Edmund B. Nightingale, Jeremy Elson, et al. 6.S897

Batch Usage on JURECA Introduction to Slurm. May 2016 Chrysovalantis Paschoulas HPS JSC

DEDICATED SERVERS WITH EBS

Before We Start. Sign in hpcxx account slips Windows Users: Download PuTTY. Google PuTTY First result Save putty.exe to Desktop

Moab, TORQUE, and Gold in a Heterogeneous, Federated Computing System at the University of Michigan

Introduction to High-Performance Computing (HPC)

Introduction to SLURM on the High Performance Cluster at the Center for Computational Research

The Why and How of HPC-Cloud Hybrids with OpenStack

Overview of a virtual cluster

LRZ Site Report. Athens, Juan Pancorbo Armada

TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 11th CALL (T ier-0)

Choosing Resources Wisely Plamen Krastev Office: 38 Oxford, Room 117 FAS Research Computing

Warsaw. 11 th September 2018

Computer Science Section. Computational and Information Systems Laboratory National Center for Atmospheric Research

CS3600 SYSTEMS AND NETWORKS

Habanero Operating Committee. January

XSEDE New User Training. Ritu Arora November 14, 2014

Hardware and Software Requirements

Sun Lustre Storage System Simplifying and Accelerating Lustre Deployments

CHAPTER 11: IMPLEMENTING FILE SYSTEMS (COMPACT) By I-Chen Lin Textbook: Operating System Concepts 9th Ed.

Storage Optimization with Oracle Database 11g

Gateways to Discovery: Cyberinfrastructure for the Long Tail of Science

Transient Compute ARC as Cloud Front-End

NERSC Site Report One year of Slurm Douglas Jacobsen NERSC. SLURM User Group 2016

Zadara Enterprise Storage in

HMEM and Lemaitre2: First bricks of the CÉCI s infrastructure

Dell EMC Ready Bundle for HPC Digital Manufacturing Dassault Systѐmes Simulia Abaqus Performance

Slurm Burst Buffer Support

TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 6 th CALL (Tier-0)

High Performance Computing Cluster Basic course

Extraordinary HPC file system solutions at KIT

Deep Dive on SimpliVity s OmniStack A Technical Whitepaper

Name Department/Research Area Have you used the Linux command line?

Slurm and Abel job scripts. Katerina Michalickova The Research Computing Services Group SUF/USIT November 13, 2013

Slurm Version Overview

Rapid database cloning using SMU and ZFS Storage Appliance How Exalogic tooling can help

HPC Storage Use Cases & Future Trends

Genius Quick Start Guide

Duke Compute Cluster Workshop. 3/28/2018 Tom Milledge rc.duke.edu

How to Pick SQL Server Hardware

Using QNAP Local and Remote Snapshot To Fully Protect Your Data

The rcuda technology: an inexpensive way to improve the performance of GPU-based clusters Federico Silla

ASN Configuration Best Practices

Emerging Technologies for HPC Storage

RAMCloud: A Low-Latency Datacenter Storage System Ankita Kejriwal Stanford University

NetVault Backup Client and Server Sizing Guide 2.1

Slurm Workload Manager Introductory User Training

HTCondor on Titan. Wisconsin IceCube Particle Astrophysics Center. Vladimir Brik. HTCondor Week May 2018

SCALABLE HYBRID PROTOTYPE

Duke Compute Cluster Workshop. 11/10/2016 Tom Milledge h:ps://rc.duke.edu/

Minnesota Supercomputing Institute Regents of the University of Minnesota. All rights reserved.

Slurm at CEA. status and evolutions. 13 septembre 2013 CEA 10 AVRIL 2012 PAGE 1. SLURM User Group - September 2013 F. Belot, F. Diakhaté, M.

Troubleshooting Jobs on Odyssey

The SHARED hosting plan is designed to meet the advanced hosting needs of businesses who are not yet ready to move on to a server solution.

Our Workshop Environment

Transcription:

Slurm at the George Washington University Tim Wickberg - wickberg@gwu.edu Slurm User Group Meeting 2015 September 16, 2015

Colonial One

What s new? Only major change was switch to FairTree Thanks to BYU for that feature Less complaints of anomalous priority behavior Looking forward to TRES and expanded QOS options in 15.08 One email per job array is also a nice incidental benefit One user ran through 100x 1000-array jobs in one night Took their Gmail account two days to recover

What s new? slurmdbd migration yesterday was painless. 14 minutes for 1.6 million job records MySQL database on a single pair of SAS disks Accounting db is ~500MB uncompressed as a mysql_dump file 15.08 migration Monday during maintenance outage Not doing that step live, although I don t expect any issues

Colonial One - Current System Dell C8220 cluster, ~200 nodes 50x GPU nodes, each with dual NVIDIA K20 GPUs 150x CPU nodes, each with dual 2.6GHz 8-core Intel Xeon CPUs, and 64/128/256GB of RAM 2TB, 48-core hugemem node FDR Infiniband, 2-to-1 oversubscription Dell NSS NFS server, ~120TB Dell / Terascala HSS Lustre storage system, ~250TB

Priority modeling Complicated priority model due to funding model Using priority/multifactor plugin Scheduler will try to balance groups runtime over a longer period, so they match their priority targets Top-level: contributing schools, colleges, research centers Second-level: research groups, departments Third: separate research groups in department Fourth: individual users in group

Fairshare model - top tier # sacctmgr show assoc tree format="account,fairshare" editing -------------------- --------root c1 1 parent cbi 1880 ccas 10080 other 1960 seas 4320 sphhs 720 c1 account - support folks, give them top ranking other - 10% open to anyone (10points per node) Everyone else - 90 points per node

Fairshare model - secondary tier # sacctmgr show assoc tree format="account,fairshare" editing -------------------- --------root ccas 1 10080 astrophysics 17 ccas-other 31 economics 1 qcd 16 at the second tier, set 1 point per node fairshare is calculated separately at each level, ratio between accounts at each level is all that matters

Fairshare Useful commands: sshare sprio sreport Force users to specify timelimit: JobSubmitPlugins="job_submit/require_timelimit"

Support Level 3 support contract with SchedMD Filed zero bugs in the past year - code just works As demonstrated yesterday, I trust Slurm more than any other component in the cluster.

Feature requests Independent fairshare hierarchies for specific partitions Our current model implies GPU and CPU nodes cycles are interchangeable No one has complained about this yet Likely fix: use Federated Clusters in next release sreport support for partitions same rationale as above using ugly/slow bash scripts to aggregate statistics Likely fix, again: split into clusters and Federate

Feature requests Ancillary commands for users to figure out what the problem with their job submission is. swtf command? Succinct description of current priority level, other issues that may be delaying launch Exposing users to sprio and sreport hasn t been effective, too much info to parse. Need something simple: You run too much. Your recent use is 500% of your fairshare value. Buy more hardware for your group.

Feature requests Better status in sinfo idle nodes aren t necessarily idle, likely waiting for additional nodes to free to start larger jobs Introduce new earmark state?

Feature requests Speculative Fairshare Accounting Account for the full value of active allocations in fairshare immediately Refund unused cycles after completion Current behavior seems to be to wait until after job completion to book usage Causes odd priority swings, exacerbated by long run times (BYU also wants to see this but Ryan s slides were already fixed)

Novel (ab)uses of Slurm Two non-traditional uses of the Slurm scheduler Fastscratch - dynamic SSD scratch space allocation Backups If all you have is a hammer, everything looks like a nail. Slurm is a pretty good hammer...

Fastscratch Motivation Genomic sequencing apps Big data Random I/O with mixed read+writes Our NFS and Lustre fileservers hate this SSDs handle this much better than disks, but... We can t afford to install large SSDs into all nodes And don t want a special set of nodes just for them And this wouldn t deal with shared access across nodes Possible solution - build fileserver with ~3 TB of SSDs

Fastscratch Need to manage space and assign to active jobs only First approach some way to use GRES? GRES works with devices in nodes, this is a separate system Jobs shouldn t have access to the fileserver Let alone launch jobs on it There s a second mechanism in Slurm that tracks resources: Licenses

Fastscratch Need to manage space and assign to active jobs only First approach some way to use GRES? GRES works with devices in nodes, this is a separate system Jobs shouldn t have access to the fileserver Let alone launch jobs on it There s a second mechanism in Slurm that tracks resources: Licenses Burst Buffers

Backups Disk backup servers Located in a separate datacenter ~100TB usable in 4RU. ZFS on FreeBSD, uses zfs snapshots Dedicated transfer node located within cluster syncbox Goal: run rsync on separate user directories in parallel. Bottlenecks are (1) SSH encryption speed (limited by core speed), (2) TCP throughput between locations, then (3) disk I/O. Running in parallel lets us get past (1) and (2).

Backups Create special partition: PartitionName=syncbox Nodes=syncbox MaxTime=7-0 RootOnly=YES Hidden=YES Shared=YES:8 Run each rsync as separate job: #!/bin/bash #SBATCH -p syncbox --share -t 1-0 # sync rsync -avd /home/$1 backup1:/storage/home # snapshot ssh backup1 zfs snapshot storage/home/$i@$(date +%Y%m% d)

Backups Running for over a year, once weekly. Works perfectly, full backup pass takes ~6 hours. No tape libraries or tape-like software to wrangle.

Thank You Documentation: http://colonialone.gwu.edu Twitter: @GWColonialOne Support Email: hpchelp@gwu.edu Tim Wickberg wickberg@gwu.edu