User Guide of High Performance Computing Cluster in School of Physics

Size: px

Start display at page:

Download "User Guide of High Performance Computing Cluster in School of Physics"

Kathlyn Owens
5 years ago
Views:

1 User Guide of High Performance Computing Cluster in School of Physics Prepared by Sue Yang This document aims at helping users to quickly log into the cluster, set up the software environment and get their jobs run. It does not cover cluster s hardware and operation system (eg. linux shell, file systems, etc.), nor does it cover the usage of software development tools and parallel programming. If required, those topics may be included in future. Connecting to the cluster The full host name of the cluster is headnode.physics.usyd.edu.au You can use secure shell (ssh) to connect to the cluster (using either short name or full name): ~ > ssh headnode Software environments You often need to set up software environment for a program you wish to use on a computer system. For example, add PBS to your search PATH. This can be specified in your login shell script file,.cshrc or.bashrc. If you have done it, you can keep using it. Otherwise, you are encouraged to use Environment Modules package for this purpose. The package provides a great way to easily customize your shell environment, especially on the fly. To find the list of software for which your environment needs to be set up for using the software, enter module avail, headnode: ~ > module avail /usr/physics/modules/3.2.8/modulefiles IntelCompilerSuite PBS ROOT-v5.28 openmpi gnu openmpi intel headnode: ~ > module whatis PBS PBS : Sets up torque and maui in your enviornment You can set an environment on a fly eg. for PBS, headnode: ~ > module load PBS or add this line in your.cshrc permanently for PBS, 1

2 module load PBS So that, each time when you login, configuration for PBS will be done for you. You can then run PBS commands such as qsub, qstat etc. or man qsub to get information about qsub. You may need to unload a package before loading another one to avoid conflict. For example, you have already set up openmpi with Intel compilers and now want to use it with GNU compilers, do this, headnode: ~ > module unload openmpi intel headnode: ~ > module load openmpi gnu command module help or man module should give you some information about how to use Environment Modules package. Workload Management System (PBS) This section covers the topics: What is PBS; Basic PBS user commands; Available queues on the cluster; Job submission and Job Script Template; Tips for specifying resources; Additional job script templates; Monitoring jobs; Interactive jobs, Array jobs What is PBS PBS is a distributed workload management system. As such, PBS handles the management of computational workload on a set of compute nodes. PBS plays three primary roles: queuing, scheduling and monitoring jobs. From the user's perspective, PBS allows you to make more efficient use of your time. You specify the tasks you need executed. The system takes care of running these tasks and returning the results to you. If the available compute nodes are full, then PBS holds your work and runs it when the resources are available. you create a batch job which you then submit to PBS. A batch job is a file (a shell script) containing a set of commands you want to run on a set of execution machines. It also contains directives which specify the characteristics (attributes) of the job, and resource requirements (e.g. number of processors, amount of memory and length of time) that your job needs. Once you create your PBS job, you can reuse it if you wish. Or, you can modify it for subsequent runs. Basic PBS user commands headnode: ~ > qsub run-job.csh job submission with job script file: run-job.csh headnode: ~ > qstat -u uid 2

3 display job status for user uid only headnode: ~ > qstat -n display all jobs status headnode: ~ > qstat -Q shows various queue types headnode: ~ > qstat -f job-id display status details for the specified running job headnode: ~ > qdel job-id delete job: job-id all PBS client commands are in headnode:/usr/physics/torque/bin. Use man page for detail usage of each command. Available queues on the cluster Queue name for all physics users (jobs will run on node and 31-35): physics Queue name for Complex Systems users (jobs will run on node 21-23): yossarian Queue name for Medical Physics users (jobs will run on node and 31-35): hippocrates Queue name for Condensed Matter Theory users (jobs will run on node 41-45): cmt Job submission and Job Script Template job submission is done by running PBS command qsub, headnode: ~ > qsub run-job.csh where run-job.csh is a batch job which contains qsub options and commands/programs that you want to run. Here is what an example run-job.csh has: #!/bin/csh #PBS -N MyJobName #PBS -o demo.txt #PBS -j oe 3

4 #PBS -q yossarian #PBS -l nodes=1:ppn=4 #PBS -l walltime=00:01:00 #PBS -m ea #PBS M username@physics.usyd.edu.au #PBS -V cd "$PBS_O_WORKDIR" #your commands/programs start here, for example: hostname exit If you submit this job, it will generate a file demo.txt with the hostname of the node it ran on printed. The output may contain harmless TTY warnings related to using tcsh rather than bash. Notes of above example run-job.csh are as follows: #!/bin/csh This indicates it will run C shell. Lines starting with #PBS are options of PBS command qsub. -N MyJobName The name for your job -o demo.txt The filename to write standard output from your job -j oe <optional> Merges stdout and stderr into the output file. Otherwise, PBS will automatically create a separate error log -q yossarian Select which PBS queue to use. Use the queue corresponding to your group -l nodes=1:ppn=4 Specify the CPU resources required, 4 processors on 1 node specified here. -l walltime=00:01:00 maximum wall time requested to run job, 1 minute specified here. Warning: if the job hasn t finished when the time reaches this walltime, you job will be killed. -m ea Sends a notification when job ends/aborts. M username@physics.usyd.edu.au your address specified here. -V Declares that all environment variables in the qsub commands environment are to be exported to the batch job. If this directive is missed, your job may be terminated because eg. $TERM is not set. cd "$PBS_O_WORKDIR" change to the directory where (the variable $PBS_O_WORKDIR contains the path from which) you submit this file. Tips for specifying resources 4

5 The main cluster resources are compute nodes and processors, memories and execution time. Multiple users share the resources on the cluster. The general advice is to request resources as accurate as your job need. As seen above, you specify the number of nodes and number of processors with the option l nodes=*:ppn=*. nodes designates how many nodes your job should be executed on, and ppn specifies the number of processors that will be allocated one each node. For example, -l nodes=1:ppn=1 1 processor on 1 node. This is what you should use for a non-parallel program -l nodes=2:ppn=1 1 processor per node, for a total of 2 processors -l nodes=1:ppn=14 14 processors on 1 node. This option will cause the queue to reject your job because no nodes have enough processors PBS will reserve the number of nodes and processors you have specified for your job no matter how many processors your job actually run on. These nodes and processors will not be given any new tasks when your job is running. On the other hand, if you request l nodes=1:ppn=1 for a Matlab job which uses a matlabpool of size 8 (it will run on 8 processors), PBS won t know your matlab program uses 8 processors and may assign some processors on the node to other jobs. Your job and the other jobs will share 7 processors and this will cause all the jobs to slow down. Therefore, it is important that you request correct number of nodes and processors for your job. Each node has about 32GB swap space, which means that when jobs use up all physical memories, memory swapping will occur to keep jobs running. Memory swapping will slow down all jobs running on the node, too. You can reserve certain physical memory by specifying l mem=??mb or l mem=??gb (maximum amount of physical memory used by the job) to avoid using swap space. For example, -l mem=3gb to reserve 3GB physical memory for your job A little trial and error may be required to find how much memory your job is using. Your job will only run if there is free memory as sufficient (more than 3GB in above example) as requested so making a sensible memory request will allow your job to run sooner. If your job needs memory more than what you have specified, the job will terminates when it reaches mem. Users may reserve more memory for his job by simply requesting all/more processors instead of specifying requested memory size on a node. It is ok, but actually blocks jobs of other users with less memory usage to run. It is recommended that you use l walltime=* instead of l cput=* to specify how much time your program is allowed to run for. walltime literally refers to wall time, the amount of time that a clock on the wall shows (as opposed to CPU time, the time all processors actually spends on a task). After it reaches walltime, your job will be terminated by PBS. It is always best to make as accurate this request as possible. Additional Job Script Templates Job script for running Matlab program: 5

6 #!/bin/csh #PBS -q physics #PBS -l walltime=1:00:00 #PBS -l nodes=1:ppn=4 #PBS -V #PBS -N test-matlab #PBS -m ea #PBS -M #PBS -j oe #PBS -o output.txt cd ${PBS_O_WORKDIR} # run matlab file yourmatlabscripts.m : matlab -nodisplay -r "yourmatlabscript, exit" Job script for running MPI program: #!/bin/csh #PBS -q physics #PBS -l walltime=10:00:00 #PBS -l nodes=4:ppn=2 #PBS -V #PBS -N test-mpi #PBS -m ea #PBS -M firstname.lastname@sydney.edu.au #PBS -j oe #PBS -o output.txt cd "$PBS_O_WORKDIR" mpirun -n 8 yourmpicode # n = nodes x ppn (see resource request) exit Monitoring jobs Use command qsub n to view all submitted jobs status. Alternatively, you can monitor execution of your job by using qload or qtop. By default, qload shows you a list of all jobs currently in the queue, a summary of which users are using the system and information on workload over the cluster. For example, headnode:~ > qload Job ID Job name Owner Queue N/CPU Time remaining Status Sensor sxy cmt 5/25 1h 00m 00s Running USER LOAD 1- SXY/XUE YANG (25 CORES) 1h 00m 00s remaining AVAILABILITY: Medical 0/0, Complex 0/0, CMT 0/0 node (12) GB node (12) GB node (12) GB node (12) GB node (12) GB node (12) GB node (12) GB node (12) GB node (12) GB node (12) GB node (12) GB node (12) GB node (12) GB node (12) GB node (12) GB node (12) GB node (12) GB node41 xxxxx (11) GB node (12) GB node42 xxxxx (11) GB node (12) GB node43 xxxxx (11) GB 6

7 node (12) GB node44 xxxxx (11) GB node (12) GB node45 xxxxx (11) GB node (31) GB node (31) GB node (31) GB Your jobs are colored red in the node availability report, so you can see which nodes your job is running on. Several switches are available for qload -a view jobs from all users, not just yourself -u USER view jobs from a different user, and highlight their jobs instead of yours. If you combine u and -a it will show jobs from all users, with highlights from the user specified with -u. -s only show a summary of node availability (to quickly check available resources) If you want to delete your job before it finishes, use the qdel command and provide your Job ID from qload. To remove the job Sensor owned by sxy as shown above, user sxy would run, sxy@headnode: ~ > qdel Interactive jobs You can start an interactive session via PBS by using qsub I. This will create an interactive job, and you will be given a shell on a compute node as though you had used ssh. For example: headnode: ~ > qsub -I q physics qsub: waiting for job 2945.headnode.physics.usyd.edu.au to start qsub: job 2945.headnode.physics.usyd.edu.au ready node02: ~ > This is ideal for compiling code and testing. When using an interactive job, you can specify the number of nodes and CPUs to lock out (although requesting a number of nodes for an interactive job is only useful if you are going to be using mpirun). For example user@headnode: ~ > qsub I l nodes=1:ppn=8 would start an interactive job that locks out an entire node. Interactive jobs will also appear in qload. Please do not use interactive jobs to perform unattended runs (e.g. with batch or screen). Interactive jobs are ONLY for attended interactive use. By default, interactive jobs will terminate after 1 hour. You can set the walltime variable with l flag to increase this, just the same as in the PBS script file. Please do not start interactive jobs with excessive walltime requests. 7

8 Array jobs Array jobs are one of the most powerful features of PBS for single-cpu jobs, are a very compelling reason for many users to learn and switch to the PBS system. They are useful when you want to run the same program many times, operating on different input files or with different input arguments. Array jobs allow you to quickly submit all of the jobs at once, and will run several instances of your job at the same time. For example, suppose I had a directory with files data1.csv, data2.csv and data3.csv, and I wanted to run my program myprog FILE on each of them. I can do this very easily using the -t option #!/bin/csh #PBS -N MyJobName #PBS -o demo.txt #PBS -q yossarian #PBS -l nodes=1:ppn=4 #PBS -l walltime=00:01:00 #PBS -m ea #PBS M username@physics.usyd.edu.au #PBS -V #PBS -t 1-3 cd "$PBS_O_WORKDIR" myprog data${pbs_arrayid}.csv The -t switch instructs PBS to submit this as an array job. You can specify a range of indices (1-3) or individual indices (1,3,5). For each index, PBS creates a separate job. Submitting this script will cause 3 job to be created, each of them requesting 4 CPUs on 1 node. The variable $PBS_ARRAYID stores the value of the array index in each submitted job. So each of the 3 jobs will run with a different value of $PBS_ARRAYID. In this way, myprog will run on each of the 3 data files, even though only one script was submitted to PBS. You can of course do fancier things with the index, like use more sophisticated scripting to operate on the array ID before calling your program etc. Another useful way to use the array ID is as an argument to a Matlab function. For example, if the command in the PBS script was matlab -nodisplay -r "mymatlabscript(${pbs_arrayid});exit" then mymatlabscript.m would be run for each of the different array ID values. You can then write code in Matlab to decide what each of the array ID values will do. 8

Introduction to GALILEO

Introduction to GALILEO Parallel & production environment Mirko Cestari m.cestari@cineca.it Alessandro Marani a.marani@cineca.it Domenico Guida d.guida@cineca.it Maurizio Cremonesi m.cremonesi@cineca.it