Good to Great: Choosing NetworkComputer over Slurm

Size: px

Start display at page:

Download "Good to Great: Choosing NetworkComputer over Slurm"

Jordan McKenzie
5 years ago
Views:

1 Good to Great: Choosing NetworkComputer over Slurm NetworkComputer White Paper 2560 Mission College Blvd., Suite 130 Santa Clara, CA (408)

2 Introduction Are you considering Slurm as your job scheduler, or are you currently a user of Slurm and wondering if it is right for you because of some issues you have been encountering? If you are an administrator or user who cares about efficiency and reliability to handle high-volume workloads, it may be worthwhile to consider NetworkComputer, a more powerful, efficient and reliable commercial job scheduler that is the industry standard for scalable high performance computing. Also, if you are a Slurm user, you may be wondering how Slurm compares to NetworkComputer before making a move. This white paper helps bridge that knowledge gap so that you can see it is relatively easy to migrate from Slurm to NetworkComputer. What is Slurm? Slurm is a free workload manager that has been around since It has known limitations in scaling and meeting job capacity needs along with its inability to fully utilize all available computing resources, thus has limited applications in commercial markets due to lack of robustness. Also, it lacks monitoring capabilities, which is a major pitfall. What is NetworkComputer? NetworkComputer by Runtime, a commercial enterprise-grade job scheduler, has some similar basic capabilities compared to Slurm however offers much more practical value to the end user for every day professional use. Being a commercial scheduler, used by top companies in the world, it is many times more scalable in capacity and performance and it is much easier to use. As the industry s fastest job scheduler, NetworkComputer is built to be "light-weight" and easy to use, thus it can be deployed also as a private scheduler to be used by a single person, by a group, or by a project. If your concerns for productivity include achieving the most efficient utilization of your expensive licenses and hardware resources, NetworkComputer will best fit your needs. Comparing NetworkComputer vs. Slurm Terminology In Slurm, the central component is called slurmctld. It manages the workload and all scheduling. Each computer (referred to as a "node" by Slurm) runs a daemon called slurmd which does some analysis of the computer it is running on and then accepts jobs sent from slurmctld. The configuration for a Slurm cluster is typically kept in a single file, typically found here: /etc/slurm-llnl/slurm.config. Good to Great: Choosing NetworkComputer over Slurm Page 2

3 In NetworkComputer, the central component is called vovserver and the daemon running on each remote computer is called vovslave. The configuration is spread over several files all contained in a directory called "vnc.swd" (pronounced "swid"), which is the "Server Working Directory". NetworkComputer has a file to describe the list of slaves and another file to describe the list of resources like licenses and limits. Everything is "elastic" in the sense that slaves and resources can be added at any time and the characteristics of a vovslave can be modified at will. This flexibility is important and a key reason why NetworkComputer is used for commercial purposes. By default, vovslaves automatically detect all characteristics of the machine they are running on, including RAM and CORES. Compare this to Slurm which is nonelastic in behavior. NetworkComputer Slurm Description vovserver slurmctld The hub of the system, manager of workload and scheduling vovslave slurmd Agent to execute jobs on a "node".../vnc.swd/slaves.tcl.../vnc.swd/resources.tcl... /etc/slurm-llnl/slurm.conf Configuration files Comparing NetworkComputer vs. Slurm Commands Slurm's command line interface consists of a few commands like sbatch, scancel, squeue, sinfo, scontrol, smap. NetworkComputer command line interaction is based on two commands: ncmgr used by the manager to start and stop the system, and nc <command> used by all users. Here is the usage message from nc, where boldface is used to highlight the commands that will be mentioned in this introduction: Good to Great: Choosing NetworkComputer over Slurm Page 3

4 % nc nc: Usage Message Usage: nc [-q queuename] <command> [command options] Queue selection: The default queue is called "vnc". You can specify a different queue with the option -q <queuename> or by setting the environment variable NC_QUEUE. Commands: clean debug dispatch forget getfield gui help hosts info list jobclass kerberos modify monitor rerun resources resume run <job> preempt slavelist stop submit <job> summary suspend wait why Cleanup log files and env files. Show how to run the same job without NetworkComputer. Force dispatch of a job to a specific slave. Forget old jobs from the system. Get a field for a job. Start a simple graphical interface. This help message. Show farm hosts (also called slaves). Get information about a job and its outputs. List the jobs in the system. List the available job classes. Interface to Kerberos (experimental). Modify attributes of scheduled jobs. Monitor network activity. Rerun a job already known to the system. Shows resource list and current statistics. Resume a job previously suspended. Run a new job (also called 'submit'). Preempt a job. Show available slave lists. Stop jobs. Same as 'run'. Get a summary report for all my jobs. Suspend the execution of a job. Wait for a job to complete. Analyze job status reasons. Unique abbreviations for commands are accepted. Advanced features: cmd <command> Execute an arbitrary VOV command in the context of the NetworkComputer server. source <file.tcl> Source the given Tcl file. - Accept commands from stdin. For more help type: % nc <command> -h Copyright (c) , Runtime Design Automation. In Slurm, you need to write a script to submit a command whereas NetworkComputer allows for the submission of any type of command. For example, if one wants to submit to Slurm the command "sleep 0", a script like this must be used: Good to Great: Choosing NetworkComputer over Slurm Page 4

#!/bin/csh -f # This is my script called./sleep0.csh sleep 0 NetworkComputer Slurm Description nc run [OPTIONS]./myscript.csh nc run [OPTIONS] sleep 0 sbatch [OPTIONS]./myscript.csh nc stop... scancel.

5 #!/bin/csh -f # This is my script called./sleep0.csh sleep 0 NetworkComputer Slurm Description nc run [OPTIONS]./myscript.csh nc run [OPTIONS] sleep 0 sbatch [OPTIONS]./myscript.csh nc stop... scancel... Methods to submit batch jobs De-schedule submitted jobs. Stop jobs if they are running nc list squeue List jobs in the system nc info JOBID nc getfield JOBID nc wait JOBID scontrol show job JOBID Detailed information about one job. Wait for the specified node/job to be done nc gui & smap Graphical visualization of jobs nc hostsnc resourcesnc cmd vsi nc hosts nc monitor sinfo sinfo -N Various commands to show information about the system List information about machines connected to scheduler Jobs Visualization and Interactive Queries in NetworkComputer In Slurm, there is no comprehensive facility to visualize your job status or point-and-click drill down easily for debugging. In NetworkComputer, you are provided with an interactive GUI where you can visualize all of the scheduled job statuses as well as have the ability to drill down into any job to get real-time details. Lots of other information such as workload and resource details are available. Figure 1: The NetworkComputer GUI shows jobs as colored boxes. The green jobs are done, the red jobs have failed, the orange jobs are currently running, the cyan jobs are waiting for resources to become available. Good to Great: Choosing NetworkComputer over Slurm Page 5

Figure 3: NetworkComputer gives users views about workload and resources.

6 Figure 2: In NetworkComputer, you can customize your view to show specific job details that matter easily in the GUI in each box. You can also easily drill down to get more job details. Figure 3: NetworkComputer gives users views about workload and resources. Comparing NetworkComputer Performance vs. Slurm for Light Workloads Good to Great: Choosing NetworkComputer over Slurm Page 6

7 In this example, the Slurm cluster consists of three identical desktops, called node1, node2, node3, with the master running on node3. The NetworkComputer setup uses the same hardware, with the server running on node2. With a light load, the difference between Slurm and NetworkComputer is negligible: In Slurm: % sbatch./sleep0.csh Submitted batch job % scontrol show job JobId= Name=sleep0.csh UserId=joe(1024) GroupId=joe(1002) Priority= Account=(null) QOS=(null) JobState=COMPLETED Reason=None Dependency=(null) Requeue=1 Restarts=0 BatchFlag=1 ExitCode=0:0 RunTime=00:00:00 TimeLimit=UNLIMITED TimeMin=N/A SubmitTime= T11:11:42 EligibleTime= T11:11:42 StartTime= T11:11:42 EndTime= T11:11:42 PreemptTime=None SuspendTime=None SecsPreSuspend=0 Partition=debug AllocNode:Sid=node2:22587 ReqNodeList=(null) ExcNodeList=(null) NodeList=node2 BatchHost=node2 NumNodes=1 NumCPUs=1 CPUs/Task=1 ReqS:C:T=*:*:* MinCPUsNode=1 MinMemoryNode=0 MinTmpDiskNode=0 Features=(null) Gres=(null) Reservation=(null) Shared=OK Contiguous=0 Licenses=(null) Network=(null) Command=/home/joe/tmp/./sleep0.csh WorkDir=/home/joe/tmp In NetworkComputer: % nc run sleep 0 Fairshare= /time/users Resources= linux64 Env = SNAPSHOT(vnc_logs/snapshots/joe/linux64/env26253.env) Command = vw sleep 0 Logfile = vnc_logs/ / JobURL = JobId = % nc info Id,User,Group ,joe.joe,/time/users.joe Environment SNAPSHOT(vnc_logs/snapshots/joe/linux64/env26253.env) Directory /home/joe Command sleep 0 Resources linux64 Submitted from node2 Submitted at Wed Jan 18 11:11:16 PST 2017 Priorities schedule=normal execution=low PlacementPolicy fastest,pack Status Done Host localhost Slave localhost QueueTime 0s Good to Great: Choosing NetworkComputer over Slurm Page 7

8 CPUTime 0.01 MaxRAM 0MB Duration 0 Age 1m31s AutoForget 1 Job is Done Main Reason: This job successfully executed. To simplify automation, NetworkComputer helps the developer in simple but effective ways, such as: The option -v 1 in nc run, which returns only the ID of the submitted job The command nc getfield, which allows the direct access to one or more fields in the job without requiring any grep/awk work NetworkComputer % set id = `nc run -v 1 sleep 0` % nc wait $id % nc getfield $id status VALID No equivalent in Slurm NetworkComputer Outperforms Slurm for Normal to Heavy Workloads This is the major reason why Slurm is not fit for commercial needs. It cannot handle heavy loads. In fact, it struggles with less-than-heavy loads, as shown in the next example. In NetworkComputer, a constant load of 100,000 or more jobs in the queue is considered an ordinary load while choking Slurm. A million jobs in the queue is a heavy load, easily handled by NetworkComputer. Let us assume we have a workload consisting of 80,000 jobs. In Slurm, you may want to submit the jobs with an array. The maximum array size in our default installation seems to be 1000 elements, so we will need to submit 80 arrays. Our Slurm installation stops accepting jobs after less than 10,000 jobs are in the queue, which is a serious limitation, while NetworkComputer easily accepts the whole workload in about 6 seconds. Good to Great: Choosing NetworkComputer over Slurm Page 8

9 NetworkComputer % time repeat 80 nc run -v 0 -array 1000 sleep 0... omitting some output from 'time' u 0.008s 0: % 0+0k 0+8io 0pf+0w 0.052u 0.004s 0: % 0+0k 0+8io 0pf+0w 0.043u 0.008s 0: % 0+0k 0+8io 0pf+0w 0.051u 0.004s 0: % 0+0k 0+8io 0pf+0w 0.052u 0.000s 0: % 0+0k 0+8io 0pf+0w 3.974u 0.462s 0: % 0+0k 0+640io 0pf+0w % nc summary NC Summary For Set System:User:joe TOTAL JOBS 80,001 Duration: 3m15s Done 690 Queued 79,309 Running 2 Slurm % repeat 80 sbatch --array= /sleep0.csh Submitted batch job Submitted batch job Submitted batch job Submitted batch job Submitted batch job Submitted batch job Submitted batch job Submitted batch job Submitted batch job sbatch: error: Slurm temporarily unable to accept job, sleeping and retrying. In a similar experiment, scaling much larger, NetworkComputer easily handles 880,000 jobs. To check the status of the workload, we can use the "summary" report option which is efficient, compact and easy to understand: (Slurm has no equivalent function) % nc summary -a -b NC Summary For Set System:jobs TOTAL JOBS 101,821 Duration: 37m40s Done 26,677 Queued 75,138 Running 4 BKT JOBS PRI AGE GROUP USER TOOL WAITING FOR , s /time/users.joe joe hostname HW linux64 Good to Great: Choosing NetworkComputer over Slurm Page 9

10 Comparing NetworkComputer vs. Slurm Scheduler Status In Slurm, to get the scheduler status, you can execute scontrol ping and get a very simple report: % scontrol ping Slurmctld(primary/backup) at node3/(null) are UP/DOWN In NetworkComputer, a common method to check status is through "nc cmd vsi" ( vsi stands for vov-server-info) where you get more meaningful relevant information: % nc cmd vsi Vov Server Information - 01/10/ :44:55 vnchq@node3:6271 URL: Jobs: 101,892 Workload: Files: 101,904 - running: 5 Sets: 22 - queued: 59,242 Retraces: 0 - done: 42,572 - failed: Slaves: 2 Buckets: 1 - busy: 1 Duration: 0s - full: 1 SchedulerTime: 0.00s Slots: TotalResources: 14 Pid: 825 Saved: 1h29m ago Size: MB TimeTolerance: 3s Recent jobs for user joe Done vw hostname > vnc_logs/ / Done vw hostname > vnc_logs/ / Done vw hostname > vnc_logs/ / Running vw hostname > vnc_logs/ / Running vw hostname > vnc_logs/ / Running vw hostname > vnc_logs/ / Comparing NetworkComputer vs. Slurm Suspension Capabilities In Slurm, you can suspend and resume a job but only if you are root or the admin user. This is a serious limitation. For example, so if we try to suspend our job , we get: % scontrol suspend slurm_suspend error: Access/permission denied % scontrol resume Good to Great: Choosing NetworkComputer over Slurm Page 10

11 slurm_suspend error: Access/permission denied In NetworkComputer, the owner of a job can suspend it and resume it. This is a basic functionality that is a nice to have for any practical usage. This capability can also be given to any user who has ADMIN privileges. % nc suspend vnc 02/20/ :51:59: message: Suspending job % nc suspend vnc 02/20/ :52:02: message: No need to suspend : it is suspended. # nc resume vnc 02/20/ :52:51: message: Resuming job % nc resume vnc 02/20/ :52:56: message: No need to resume : it is running. Another ability is to preempt a job, with nc preempt: % nc preempt In this case, the job is suspended, all resources associated with the job are freed (including licenses and CPUs) and those resources are made available to other "more important" jobs in the queue. If no such job exists, then the preempted job is automatically resumed. Slurm has a similar but less feature set preemption capability. Comparing how NetworkComputer vs. Slurm Handles Dependencies In Slurm, to execute a job after another (e.g. job with id ) has completed, we can say: % squeue --dependency=afterok: /mysleep.csh In NetworkComputer we have a dependency similar to "afterok": % set j1 = `nc run -v 1 sleep 10` % nc run -dep $j1 sleep 2 In addition, NetworkComputer has a key advantage of a simple way of waiting for a job to complete, with nc wait, which does not to exist in Slurm: % set j1 = `nc run -v 1 sleep 10` % nc wait $j1 If we want to run one job at a time, in Slurm we can use the "singleton" dependency, while in NetworkComputer we can use the "-limit 1" option in "nc run": Good to Great: Choosing NetworkComputer over Slurm Page 11

12 NetworkComputer % nc run -limit 1 -array 1000 sleep 0 Slurm % sbatch -J myname --array= /mysleep0.csh Comparing how NetworkComputer vs. Slurm Manages Software Licenses In Slurm, the licenses can be represented by the "Licenses" lines in the slurm.config file: # Fragment of slurm.config Licenses=verilog:3,spice:2 In NetworkComputer, the licenses are sampled automatically, typically every 30 seconds, by the LicenseMonitor subsystem, which then immediately updates the scheduler in NetworkComputer. This allows the automatic tracking and management of all features that are being serviced by FLEXlm or any other license daemons. NetworkComputer typically handles many hundreds of these licenses. For commercial purposes, this is a much more robust system. Comparing NetworkComputer vs. Slurm Architecture In Slurm, the list of current jobs (less than 40k jobs) is held in the directory /var/lib/slurmllnl/slurmctld on the master node. Each job is a sub-directory which contains: The copy of the submission script The snapshot of the submission environment In NetworkComputer, all job information is kept efficiently into memory. Here is a snapshot of the two daemons running on the same machine after running each about 400,000 jobs: NetworkComputer: ncadmin ? S Feb17 1:56 vovserver -p nc Slurm: slurm ? Sl Feb17 68:05 /usr/sbin/slurmctld Good to Great: Choosing NetworkComputer over Slurm Page 12

13 Note that the NetworkComputer vovserver memory footprint is less than half the size of slurmctld memory footprint, even if it holds all 400k jobs in memory. It is thus observed that NetworkComputer s memory management is far superior than Slurm. So, you want to use NetworkComputer with Slurm? Yes, you can get the benefits of capacity and ease of use of NetworkComputer while using Slurm as the main allocator of computing resources. In situations where you need to retain Slurm for whatever reason, NetworkComputer can easily piggyback on Slurm. This is like having your own private scheduler for your workload without violating the rules of your organization. A sample method to test drive the goodness of NetworkComputer using computing resources from your existing Slurm installation is shown here: Install NetworkComputer on a shared file system: (example: in /remote/sw/runtime/ ) Setup your shell by sourcing one of the setup scripts, found in the installation directory: (example /remote/sw/runtime/ /common/etc/vovrc.{sh,csh}) Start your private scheduler with: % ncmgr start -dir. -queue my_vnc... % setenv NC_QUEUE my_vnc Create the following script, which start a transient vovslave on the current host: % cat ncslave.csh #!/bin/csh f # Start a slave with 1 slot, max load 100, for no more than 2 hours vovslaveroot -T 1 -M 100 -a "@HOST@_@PID@" -z 1m -Z 2h Request computing resources from Slurm: % vovproject enable my_vnc % sbatch./myslave.csh % sbatch --array=1-50./myslave.csh Now you can submit jobs to your NetworkComputer instance and use resources from Slurm. If you are the network administrator, you can someday consider moving the entire management of your clusters to NetworkComputer. Good to Great: Choosing NetworkComputer over Slurm Page 13

14 Summary Although Slurm is free, it has major limitations related to scalability and usability that prevent it from being a dependable solution for commercial applications. In fact, it s the reason why you won t find it being using in commercial settings that have serious reliability needs. Dealing with only lighter workloads, it lacks the capacity needed for every day needs. In addition, the user interface is raw and lacks user-friendly functions to gives user proper visibility into their jobs. With NetworkComputer, you get a robust enterprise grade job scheduler that scales with all workload types delivering high performance and capacity as well as GUIs for giving users maximum visibility into their jobs. As well, you receive enterprise level service so that you know you have full customer support for your mission critical needs. Today, NetworkComputer is the job scheduler of choice for major Fortune companies for these reasons. To get started with NetworkComputer, visit and sign up. Good to Great: Choosing NetworkComputer over Slurm Page 14

Submitting batch jobs Slurm on ecgate

Submitting batch jobs Slurm on ecgate Xavi Abellan xavier.abellan@ecmwf.int User Support Section Com Intro 2015 Submitting batch jobs ECMWF 2015 Slide 1 Outline Interactive mode versus Batch mode Overview