June 26, Explanatory meeting for users of supercomputer system -- Overview of UGE --

Size: px

Start display at page:

Download "June 26, Explanatory meeting for users of supercomputer system -- Overview of UGE --"

Claude Snow
5 years ago
Views:

1 June 26, 2012 Explanatory meeting for users of supercomputer system -- Overview of UGE --

2 What is Univa Grid Engine (UGE)? It is software that is used to construct a grid computing system. It functions as a batch job system. It is a commercial product derived from Sun Grid Engine 6.2U5 (the last opensource version). The main developers of SGE participated in the development of UGE. The commands, etc., for entering jobs in the UGE are the same as that in SGE. 1

3 Advantages of using UGE Multiple jobs can be smoothly executed in a sequential manner. When more than one user enter several jobs simultaneously, UGE carries out the scheduling. Effective scheduling is carried out according to the memory, CPU, etc., required by a job. Precautions for using UGE Parallelization of jobs and other such functions cannot be performed. If the resource demand when a job is entered is not declared properly, large-scale hang-up of computer may occur. 2

How to use a supercomputer 1 Log into the

for which qlogin is executed into UGE 4 In

small load 5 The job execution result is

job execution result is checked 1 Job Job Job

communications with external 2 8 units for

dedicated to interactive 6 3 Result Result

4 How to use a supercomputer 1 Log into the gateway node (gw.ddbj.nig.ac.jp) 2Execute qlogin and log into the interactive node 3 Enter a job from the host for which qlogin is executed into UGE 4 In the UGE, the job is executed at a node with small load 5 The job execution result is output to the home directory, lustre 6 The job execution result is checked 1 Job Job Job gateway 1 unit (activestandby) dedicated to communications with external 2 8 units for professional use, 8 units for research dedicated to interactive 6 3 Result Result Result Result Result 4 81 units for professional use, 208 units for research dedicated to batch 5 3

5 Basic terms (concept) (1) Host (node) Physically existing computer qmaster Job Job execd execd execd execd Master host Host in which master daemon (qmaster) of UGE operates The master daemon is the daemon that controls the UGE, which carries out acceptance of job, scheduling, delivery to execution host, collection, etc. Execution host Host in which the execution daemon (execd) of UGE operates The execution daemon accepts the instruction of job execution from the master daemon and executes the job. 4

6 Basic terms (concept) (2) submit host Job Job qmaster Submit host Host that can enter a job into UGE The execution host for which login is enabled using the qlogin command is a submit host execd execd execd execd execd execd Queue Target of job entry It is configured using more than one execution host. There are several types of queues depending on their application. execd JobSlot JobSlot JobSlot JobSlot JobSlot JobSlot JobSlot JobSlot Job slot Container for executing a job that is set for each execution host. A job is entered into a queue and finally stored in the slot. 5

7 Two UGE environments The supercomputer system has two UGE environments as shown below. The usable environmental setting is configured at login, so a user does not need to configure the settings. UGE environment for DDBJ operation UGE environment that can be used with account for DDBJ operation SGE_ROOT=/home/geadmin/UGES SGE_CELL=uges UGE environment for research UGE environment that can be used with account for general research SGE_ROOT=/home/geadmin/UGER SGE_CELL=uger 6

8 Types of queues (for research, as of June 26) Queue name Number of job slots Upper limit of execution time (days) Application week_hdd.q If queue/resource is not specified, a job is entered in this queue week_ssd.q Used when a job that may take short execution time is executed using ssd month_hdd.q Used when a job that may require long execution time is executed month_ssd.q Used when a job that may take long execution time is executed using ssd month_gpu.q Used when a job that uses gpu is executed month_medium.q Used when a job that uses a medium node is executed month_fat.q Used when a job that uses a fat node is executed debug.q 64 1 Used for checking the operation of a job login.q Used for entering a job 7

9 Types of queues (for commercial use, as of June 10) Queue name Number of job slots Upper limit of execution time (days) Application month_hdd.q If queue/resource is not specified, a job is entered in this queue debug.q 32 1 Used for checking the operation of a job login.q 64 - Used for entering a job 8

10 Upper limit of execution time The upper limit of execution time is set for resolving jamming of jobs that are waiting for execution when a jam is generated. A job that exceeds the upper limit of the execution time is killed. The execution time is computed as the actual time after a job is executed. (It is not the CPU use time. The time of waiting in a queue is not included.) Before entering a job, it is necessary to know the execution time using the environment for checking the operation. 9

11 qlogin When a job is entered, log into the host that has sufficient resource of login.q using the qlogin command from the gateway host (common to environment for both research and DDBJ operation). $ qlogin Your job 329 ("QLOGIN") has been submitted waiting for interactive job to be scheduled... Your interactive job 329 has been successfully scheduled. Establishing builtin session to host t217i... $ uname -n t217 When logging into the execution host, ensure that you use the qlogin command to do so. Do not directly log into the system to execute a job, as the load dispersion mechanism would not function properly. (*A user who directly logs into the system is recorded.) 10

12 Job entry (1) A job is entered by creating a shell script that is described for UGE. An example is shown below (file name: test.sh). #!/bin/sh #$ -S /bin/sh pwd hostname date sleep 20 date echo "to stderr" 1>&2 #$ at the second line is the prefix for specifying the UGE option. The interpreter that is used when this shell script is operated on UGE is specified with #$ -S (in this example, the interpreter is /bin/sh). If this line is omitted, it is necessary to specify -S path of interpreter to be used with the command option when a job is entered. 11

13 Job entry (2) Enter a job using the qsub command $ qsub test.sh When a job is entered, it is inserted in the queue that waits for execution. The status of the entered job can be checked using the qstat command (to be described). After execution, check the output of the job. A file in which the standard output of job and the standard error output are recorded is output to the home directory. $ cat ~/test.sh.o325 /lustre1/home/ddbjuser t 年 3 月 21 日水曜日 11:15:01 JST 2012 年 3 月 21 日水曜日 11:15:21 JST $ cat ~/test.sh.e325 to stderr 12

14 Main options of qsub (1) -S <path of interpreter> -cwd Specify the path of the interpreter when the script file is executed. The interpreter of the script language, such as Perl or Ruby, can be specified in addition to the shell, e.g., (sh is specified): -S /bin/sh (Perl is specified): -S /usr/local/bin/perl A job is executed not in the home directory but in the directory where the qsub command is executed. If this option is specified, the file containing the standard output and standard error output are output to the directory where qsub command is executed. -o <output destination of standard output> -e <output destination of standard error output> Specify the output destinations of the standard output of job and the standard error output. If the standard output or standard error output is not output as a file, specify /dev/null as the output destination, e.g., -o /dev/null -e /dev/null 13

15 Main options of qsub (2) -N <alias name of job> The name of job that can be checked using qstat is changed to the specified name. If it is not specified, the job name is the same as the script name. -l resource demand 1,resource demand 2, -l resource demand 1 l resource demand 2 l They are mainly used for selecting a queue or changing the upper limit of memory usage capacity. Details are described later. 14

16 Checking job status The status of an entered job is checked using the qstat command If a job exists in the waiting queue, qw is displayed as the state $ qstat job-id prior name user state submit/start at test.sh ddbjuser qw 03/19/ :11:56 While a job is executed, "r" is displayed for state $ qstat job-id prior name user state submit/start at test.sh ddbjuser r 03/19/ :11:56 The main states are shown below. States can also be displayed in combinations of more than one. r qw t E d A job is being executed on the execution host A job is waiting in a queue A job is being transferred to the execution host An error is generated in a job A job is being deleted 15

17 Main options of qstat -f -u [uid] The queue usage status is displayed additionally. e.g., qstat f The specified job of [uid] is also displayed. If it is set to " * ", the jobs of all users are displayed. e.g., qstat u * -j [jobid] The detailed information of specified job of [jobid] is checked. The reason of error status Eqw can be checked. e.g., qstat j

18 Deletion of a job The qdel command is used to delete a job. A job is deleted by specifying the job ID or UID. When the job ID is specified (specify only job ID) $ qsub test.sh Your job 326 ("test.sh") has been submitted $ qdel 326 ddbjuser has deleted job 326 When UID is specified (UID is specified using -u option) $ qsub test.sh Your job 327 ("test.sh") has been submitted $ qsub test.sh Your job 328 ("test.sh") has been submitted $ qdel -u ddbjuser ddbjuser has registered the job 327 for deletion ddbjuser has registered the job 328 for deletion 17

19 Checking job execution result The details of a job that has been executed are checked using the qacct command. The resource that was actually consumed by the job, etc., can be checked. $ qacct -j 325 ============================================================== qname week_hdd.q hostname t165i group se owner ddbjuser project NONE (*An omission*) cpu mem io iow maxvmem M arid undefined 18

20 Precautions before entering a job * Before entering multiple jobs, ensure that a test is conducted. >> If the memory is insufficient, several hosts can cause a hang-up >> A lot of error jobs can cause overload of UGE Do not allocate or output an input file or a file of final output to a directory of each local host such, as /tmp and /ssd >> The input cannot be read by a host in which a job is being executed >> The result cannot be referenced after execution *1 The number of jobs that are executed by a job at the same time should be 1. (Do not fork multiple processes with one job without using def_slot (to be described)) *1 The number of threads of a process should be 1. (Do not execute a process with multiple threads without using def_slot (to be described)) >> Load dispersion cannot occur and the host causes a hang-up 19

21 Switching usage of queue (for research) (1) The usage of a queue can be switched by specifying a resource with the "-l" option Do not specify a resource $ qsub test.sh week_hdd.q,week_ssd.q is used The priority order is week_hdd.q > week_ssd.q Specify "month" (* Specify it when a long calculation time is estimated) $ qsub l month test.sh month_hdd.q,month_ssd.q,month_gpu.q is used The priority order is month_hdd.q > month_ssd.q > month_gpu.q Specify "ssd" (* Specify it when a job that uses SSD is entered) $ qsub l ssd test.sh Only week_ssd.q is used 20

22 Switching usage of queue (for research) (2) Speicfy "month" and " ssd" (* Specify them when a long calculation time is estimated for a job that uses SSD) $ qsub l month l ssd test.sh month_ssd.q,month_gpu.q is used The priority order is month_ssd.q > month_gpu.q Specify "month" and " gpu" (* Specify them when a job that uses GPU is entered) $ qsub l month l gpu test.sh Only month_gpu.q is used * When a host on which GPU is mounted is used, ensure that "-l month is specified * Only one job that requires GPU can operate on one host on which GPU is mounted Specify "month" and " medium" (* Specify them when a job that uses a medium node is entered) $ qsub l month l medium test.sh Only month_medium.q is used * When a medium node is used, ensure that "-l month is specified 21

23 Switching usage of queue (for research) (3) Specify "month" and " fat" (* Specify them when a job that uses a fat node is entered) $ qsub l month l fat test.sh Only month_fat.q is used * When a fat node is used, ensure that "-l month is specified Specify "debug" (* Specify it when checking the operation of a job) $ qsub l debug test.sh debug.q is used Specify "debug" and "gpu" (* Specify them when checking the operation of a job that uses GPU) $ qsub l debug l gpu test.sh A host on which GPU is mounted in debug.q is used 22

24 Switching usage of queue (for research) (4) *Caution* To use a GPU, medium, or fat node, ensure that you specify "month" in addition to the resources ("gpu, "medium," "fat, respectively). With the current queue configuration, all of the GPU, medium, and fat nodes are allocated only for a queue for a time-consuming calculation, so it is necessary to specify the resource "month" for using those queues. If "month" is not specified, the current queues have no resource that satisfies the resource specification condition, and so the submission is normally carried out, but the job is not executed. 23

25 Switching usage of queue (for professional use) Do not specify a resource $ qsub test.sh month_hdd.q is used Specify "debug" (* Specify it when checking the operation of a job) $ qsub l debug test.sh debug.q is used 24

26 When large amount of memory is used (1) The memory available for a UGE job is restricted to 4 GB by default. When a large amount of memory is used, declare the memory usage using the "-l" option When 8 GB of memory is used for 1 job $ qsub l s_vmem=8g l mem_req=8g test.sh When 128 GB of memory is used for 1 job on a medium node $ qsub l s_vmem=128g l mem_req=128g l month l medium test.sh 25

27 When large amount of memory is used (2) s_vmem: Declares the upper limit value of virtual memory that can be used by a job. The job cannot use the memory exceeding the amount specified here. mem_req: Declares the memory to be used. For the execution host, the value of "mem_req" is set as an index that indicates the remaining amount of memory, and the value increases or decreases depending on the job execution status. It is used as an index of load dispersion. Job mem_req= 8G Job mem_req= 128G qmaster execd mem_req=64g execd mem_req=64g execd mem_req=2048g Job mem_req= execd 8G mem_req=56g Job execd mem_req= 128G mem_req=1920g 1. While a job is being executed, mem_req value of the host decreases for the portion declared by the job 2. If the job is terminated, mem_req for the portion declared by the job is reset to the original value 3. If mem_req of the host is smaller than mem_req declared by a job, the job is not executed for the host 26

28 Array job (1) If a job is entered as an array job, it is possible to give a different parameter for the same job and execute it repeatedly. If the "-t" option of qsub is used, the array job can be entered. $ cat arraytest.sh #!/bin/sh #$ -S /bin/sh echo --- echo JOB_ID: ${JOB_ID} echo SGE_TASK_ID: ${SGE_TASK_ID} echo SGE_TASK_FIRST: ${SGE_TASK_FIRST} echo SGE_TASK_LAST: ${SGE_TASK_LAST} echo SGE_TASK_STEPSIZE: ${SGE_TASK_STEPSIZE} echo --- $ qsub t 1-6:2 arraytest.sh Your job-array :2 ("arraytest.sh") has been submitted $ qstat job-id prior name user state submit/start at queue slots jatask-id arraytest. ddbjuser r 03/19/ :43:13 week_hdd.q@t168i arraytest. ddbjuser r 03/19/ :43:13 week_hdd.q@t168i arraytest. ddbjuser r 03/19/ :43:13 week_hdd.q@t178i

29 Array job (2) $ ls arraytest.sh.o1031.* arraytest.sh.o arraytest.sh.o arraytest.sh.o $ cat arraytest.sh.o JOB_ID: 1031 SGE_TASK_ID: 1 SGE_TASK_FIRST: 1 SGE_TASK_LAST: 6 SGE_TASK_STEPSIZE: $ cat arraytest.sh.o JOB_ID: 1031 SGE_TASK_ID: 5 SGE_TASK_FIRST: 1 SGE_TASK_LAST: 10 SGE_TASK_STEPSIZE: To prevent UGE overload, the number of jobs that can be entered by one user is limited. If you try to enter jobs exceeding this limit, an error is generated and you cannot enter the jobs. The upper limit of jobs to be entered per user for this system is If a job is entered as an array job, the load on UGE can be reduced. If array jobs of 5000 jobs are entered, the jobs for 5000 * SGE_TASK_ID can be executed. The upper limit of SGE_TASK_ID is

30 MPI job (1) An example of shell script that is used to enter an MPI job is shown below. $ cat mpitest.sh #!/bin/sh #$ -S /bin/sh #$ -pe mpi 2-24 #$ -cwd /usr/local/bin/mpirun -np $NSLOTS -machinefile $TMPDIR/machines./mpitest -pe <MPI execution environment name> <minimum parallel count>-<maximum parallel count> $NSLOTS Specify MPI execution environment (to be described), minimum parallel count, and maximum parallel count Values that are automatically decided from [minimum parallel count] to [maximum parallel count] shown above are set according to the free space of queue -machinefile $TMPDIR/machines The file $TMPDIR/machines is automatically generated by UGE 29

31 MPI job (2) Enter an MPI job in UGE $ qsub mpitest.sh Your job 1292 ("mpitest.sh") has been submitted $ qstat job-id prior name user state submit/start at queue slots ja-task-id mpitest.sh ddbjuser r 03/19/ :55:24 week_hdd.q@t303i 24 $ cat mpitest.sh.o1292 Hellow World from Process 0 of 24 running on t303 Hellow World from Process 1 of 24 running on t290 (* 中略 *) Hellow World from Process 19 of 24 running on t311 Main MPI execution environment mpi: executes parallel jobs using as many hosts as possible mpi-fillup: executes parallel jobs using the same host whenever possible 30

32 Use of parallel environment def_slot (1) Use it when you enter a job that may cause overload if it is entered directly, such as a job that forks multiple processes or a job that executes a multi-thread process. $ qsub pe def_slot 2 test.sh The number of job slots consumed by this job is redefined by a value following "def_slot. In this example, this job consumes two job slots. It is used as an index of a value that specifies the maximum number of processes that are simultaneously initialized in the target job, and the maximum number of threads used by a process initialized in the job. 31

33 Use of parallel environment def_slot (2) *Cautions* If def_slot is specified, the amount of resource demand becomes "resource amount specified by -l" "number of slots specified by def_slot" Note that the excessive resource may be demanded unintentionally. If one of the following options is specified, the amount of resource demand becomes 32 GB. $ qsub pe def_slot 4 l max_vmem=8g l mem_req=8g test.sh If the resource demand is not clarified, the default value is applied. In the following case, the amount of resource demand becomes 16 GB. $ qsub pe def_slot 4 test.sh In the following case, the amount of resource demand becomes 80 GB. However, a thin node does not have a node that meets the condition, so the job is not executed if it is submitted. $ qsub pe def_slot 10 l max_vmem=8g l mem_req=8g test.sh 32

34 Contact information If you have any enquiries or opinions, please contact Supercomputer SE team, National Institute of Genetics Mail: Room: w202 Extension: upercom-intro.html 33

35 Revision history Date of revision Revised content March 21, 2012 May 10, 2012 Newly created The number of job slots for "type of queue" is corrected according to the current situation. Precautions on use of def_slot are added. June 18, 2012 Upper limit of execution time for month_*.q is changed to 62 days from 31 days, type of queue is corrected. June 26, 2012 Configuration of the queues is changed, type of queue is corrected. The types of queues for professional use are reduced, Switching usage of queue (for professional use) is modified. 34

Feb 22, Explanatory meeting for users of supercomputer system -- Knowhow for entering jobs in UGE --

Feb 22, Explanatory meeting for users of supercomputer system -- Knowhow for entering jobs in UGE -- Feb 22, 2013 Explanatory meeting for users of supercomputer system -- Knowhow for entering jobs in UGE -- Purpose of this lecture To understand the method for smoothly executing a few hundreds or thousands