IBM Scheduler for High Throughput Computing on IBM Blue Gene /P Table of Contents

Size: px
Start display at page:

Download "IBM Scheduler for High Throughput Computing on IBM Blue Gene /P Table of Contents"

Transcription

1 IBM Scheduler for High Throughput Computing on IBM Blue Gene /P Table of Contents Introduction...3 Architecture...4 simple_sched daemon...4 startd daemon...4 End-user commands...4 Personal HTC Scheduler...6 Using HTC Scheduler with Tivoli Workload Scheduler LoadLeveler...8 Using HTC Scheduler with LoadLeveler version 3.5 and later...10 Using HTC Scheduler with LoadLeveler before version Service setup...12 Configuration...14 Configuration options...14 Daemons...18 simple_sched daemon...18 Command-line options...18 Shutting down...19 startd daemon...19 Command-line options...19 Submit plug-in...19 Shutting down...20 End-user commands...21 qcmd...21 Commands...21 Immediate mode...23 Interactive mode...24 Response format...24 Submit ID response...24 Submit status response...25 Scheduler status response...25 Request rejected response...25 qsub...26 qstat...26 qdel...26 Submitted job states...27 Note about when state info is available...28 run_simple_sched_jobs...29 Command files...29 Output...30 Signal handling...30

2 Positional parameters...30 LoadLeveler integration...30 Configuration IBM Scheduler for HTC on IBM Blue Gene/P

3 Introduction The HTC Scheduler is a simple scheduler for High Throughput Computing (HTC) jobs on Blue Gene/P. HTC on Blue Gene/P provides the ability to run independent, single-node tasks on each node in a partition. For information on the setup, configuration, and use of HTC on Blue Gene/P, refer to the IBM System Blue Gene Solution: Blue Gene/P System Administration Redbook (SG ) and IBM System Blue Gene Solution: Blue Gene/P Application Development Redbook (SG ). An HTC application may involve more tasks than there are nodes in the partition. In this situation, some tasks must wait until another task finishes before they can be submitted. A resource manager, or scheduler, automates this task. The HTC Scheduler is an implementation of a resource scheduler that was specifically designed to work in the Blue Gene/P's HTC environment. The HTC Scheduler is capable of reliably and efficiently running a large number of HTC jobs on a Blue Gene/P system. The parts that make up the HTC Scheduler are the end-user command line utilities (qsub, qstat, qdel, and qcmd), the simple_sched daemon, and the startd daemon. Also available is a utility for running a batch of jobs through a personal instance of the HTC Scheduler. 3

4 Architecture Figure 1 shows the architecture of the HTC Scheduler which includes a single simple_sched daemon, multiple startd daemons, and several instances of end-user commands (qcmd, qsub, qstat, and qdel). submit qcmd startd submit qsub qstat simple_sched submit submit qdel startd submit Figure 1: HTC Scheduler architecture submit simple_sched daemon This daemon waits for the client programs (qcmd, qsub, qstat, qdel, and startd) to contact it. When a startd client connects to the simple_sched daemon, it puts the startd client into a pool to which it can assign new jobs. When an end-user program makes a request, it handles the request. For example, if it's a new job request (sent by the qsub program) it assigns the new job a submit ID and puts the job on a queue; when that job reaches the front of the queue, it will assign the submitted command to a startd client. The simple_sched daemon can run on the service node or a front end node. startd daemon The startd daemon connects to the simple_sched daemon. When simple_sched sends it a job to run, startd forks off a process, and in the child process sets up the environment (sets the gid and uid), and execs submit with the job-specific command line options. As the submit process runs, it notifies the startd daemon of the state of the job through the submit plugin (for example, the HTC scheduler's submit plugin is called when the job ID is assigned). When the submit process ends, startd retrieves the exit status and sends the job result information to simple_sched. A single startd process can have multiple submits forked and running at the same time. The startd daemon will be calling submit, so the computer it's running on must have a submit multiplexer (submit_mux) running and configured (typically a front end node). End-user commands The end-user commands qcmd, qsub, qstat, and qdel, are used to send commands to the simple_sched daemon. There are commands available for submitting a new job, getting the status of a submitted job, 4 IBM Scheduler for HTC on IBM Blue Gene/P

5 canceling a job, and performing administrative functions. These are typically run from a front end node, but can also be compiled to run on a workstation. 5

6 Personal HTC Scheduler Users may want to run HTC jobs on their own partition using a personal instance of the HTC Scheduler (possibly under the direction of the LoadLeveler scheduler). This is made easier using the provided run_simple_sched_jobs command which will start a personal instance of the HTC Scheduler and startd, execute commands either specified in command files or read from stdin, and exit when the commands have all completed. It creates a personal configuration file that can be used when submitting jobs externally. In this example, a user wants to run the location program several times with different arguments. First, create a file that contains the program to run along with arguments, where each line is the program to run and any arguments, for example, the text file cmds.run contains -- location argsfor1 -- location argsfor2 -- location argsfor3... one line for each If you want the stdout and stderr for the program to go to a different location, put the -stdout-file and -stderr-file parameters before the executable. You can also use this feature to discard output. -stdout-file=loc_out1.out stderr-file=loc_out1.err -- location.aout argsfor1 -stdout-file=/dev/null stderr-file=/dev/null -- location.aout argsfor2 The first step in running jobs using run_simple_sched_jobs is to boot the partition. The htcpartition utility is provided to boot a partition in HTC mode from the command line. Refer to the IBM System Blue Gene Solution: Blue Gene/P Application Development Redbook (SG ) for full documentation of the htcpartition utility. The --configfile parameter tells htcpartition to create a file that run_simple_sched_jobs can read to get the pool name and pool size. $ htcpartition --boot --partition R00-M0-N14 --mode SMP --configfile my_config.cfg Use run_simple_sched_jobs to start up a personal instance of HTC Scheduler and run your commands: $ run_simple_sched_jobs -config my_config.cfg cmds.run To run more commands using the same configuration file, pass -reuse-config to run_simple_sched_jobs: $ run_simple_sched_jobs -config my_config.cfg -reuse-config cmds2.run $ htcpartition --free --partition R00-M0-N14 To have run_simple_sched_jobs read the commands from stdin, use - as the command file. Using this method, the commands to run can be generated by another program or script: $./gen_cmds.py run_simple_sched_jobs -config my_config.cfg - To have the personal HTC Scheduler continue running after all the command files have been run, use the -keep-running option: shell_1$ run_simple_sched_jobs -config my_config.cfg -keep-running shell_2$ qsub -config my_config.cfg my_program arg1 arg2 run_simple_sched_jobs prints out a line to stdout whenever it's notified that a command completed, 6 IBM Scheduler for HTC on IBM Blue Gene/P

7 either successfully or unsuccessfully. 1-1 is COMPLETED exit status is COMPLETED exit status 0... The first number is the request ID (the line command from the command files) and the second is the submit ID supplied by the simple_sched daemon. Note that commands may complete in a different order than they were submitted. Once all the commands in the command files have completed, run_simple_sched_jobs prints out a summary containing the number of jobs that completed successfully, the number of jobs that completed with non-zero exit status, and the number of jobs that failed to run due to an error. For details on using run_simple_sched_jobs, see the run_simple_sched_jobs chapter on page 29. 7

8 Using HTC Scheduler with Tivoli Workload Scheduler LoadLeveler The LoadLeveler scheduler was enhanced in version 3.5 to enable use of Blue Gene/P's HTC mode with the HTC Scheduler as a meta-scheduler. This makes integrating the HTC Scheduler into a LoadLeveler workflow much easier and more efficient. Refer to the following sections depending on your version of LoadLeveler for instructions on configure LoadLeveler and creating a job command file (JCF) to submit HTC jobs. Figure 2 illustrates how the HTC Scheduler glides in to LoadLeveler. LoadLeveler has selected and created a partition in the Blue Gene/P using the Control System (Bridge) API. LoadLeveler's Central Manager tells a LoadLeveler startd to run this JCF. The LoadLeveler startd will execute run_simple_sched_jobs which starts a simple_sched server, a HTC Scheduler startd, and a qsub program, then reads the cmds.run input file, converting those lines into calls to qsub. The HTC Scheduler startd process executes several submits in parallel, which contact the submit mux, which communicate with the control system processes, which cause the program to run on the compute nodes. 8 IBM Scheduler for HTC on IBM Blue Gene/P

9 submit pgm1 args submit pgm2 args submit pgm3 args FEN Submit Mux startd (SIMPLE) cmds.run file simple_sched run_simple_sched_jobs qsub pgm1 args qsub pgm2 args qsub pgm3 args pgm1 args pgm2 args pgm3 args Service node Central Manager startd (LL) starter Control System API DB2 I Status Updates Control System Processes Actions Blue Gene Machine Running HTC jobs Figure 2: HTC Scheduler as LoadLeveler glide-in

10 Using HTC Scheduler with LoadLeveler version 3.5 and later LoadLeveler version 3.5 provides new features that make using the HTC Scheduler under LoadLeveler easier. This section describes the new Job Command File (JCF) keywords and behavior available in LoadLeveler. LoadLeveler provides a bg_partition_type keyword in the JCF that specifies whether the partition will be booted for HPC or HTC jobs and, if the partition is booted for HTC jobs, the mode in which the jobs will run. The values for bg_partition_type are as follows: HPC HPC jobs, this is the default value if bg_partition_type isn't present HTC_SMP HTC jobs in SMP mode HTC_DUAL HTC jobs in Dual mode HTC_VN HTC jobs in Virtual Node mode HTC_LINUX_SMP HTC jobs in Linux / SMP mode The bg_user_list keyword is used to specify the users that can run jobs on the partition. This can be set to a space-separated list of user names or the special value ALL to allow any user to run on the partition. If not specified, only the step owner can submit jobs to the partition. Note that Linux / SMP mode might not be available on every Blue Gene/P system. The bg_partition and bg_user_list keywords are not inherited by other job steps. LoadLeveler sets several environment variables when the it runs the job that run_simple_sched_jobs uses to boot the partition chosen by LoadLeveler in the correct mode. Example 1 contains a sample LoadLeveler JCF file that uses run_simple_sched_jobs: #!/bin/bash #@ job_name = htc_glide_in_32 #@ output = $(job_name).$(jobid).out #@ error = $(job_name).$(jobid).err #@ job_type = bluegene #@ bg_size = 32 #@ bg_partition_type = HTC_VN #@ bg_user_list = user1 user2 user3 #@ queue /bgsys/opt/simple_sched/bin/run_simple_sched_jobs cmds.txt Example 1: Sample LoadLeveler 3.5 JCF Modify the sample JCF to run your HTC application. The sections highlighted in bold should be changed to suit your application. The job_type must be bluegene so that LoadLeveler will run the script on a front end node and allocate a partition. LoadLeveler also provides commands that can be used to display the partition type and user list information of a HTC partition. For example, llstatus will show the partition type is HTC (SMP) if the partition is booted in HTC mode for SMP jobs. 10 IBM Scheduler for HTC on IBM Blue Gene/P

11 At present, LoadLeveler doesn't re-use partitions booted in HTC mode even if the cache partitions option is enabled. LoadLeveler will automatically free the partition after the job has ended. Using HTC Scheduler with LoadLeveler before version 3.5 In order to use the HTC Scheduler with LoadLeveler prior to version 3.5, LoadLeveler must be configured to not cache Blue Gene partitions. This is done by setting BG_CACHE_PARTITIONS=false in the LoadL_config file. Refer to the Tivoli Workload Scheduler LoadLeveler documentation for information regarding this configuration option. This requirement has been removed with LoadLeveler 3.5. The HTC Scheduler can be called from a LoadLeveler JCF to submit a batch of HTC jobs to a partition managed by LoadLeveler. The JCF should look like Example 2. #!/bin/bash #@ job_name = htc_glide_in_32 #@ output = $(job_name).$(jobid).out #@ error = $(job_name).$(jobid).err #@ job_type = bluegene #@ bg_size = 32 #@ queue function free_partition() { /bgsys/drivers/ppcfloor/bin/htcpartition free } export RUN_JOBS_CONFIG_FILE=my_simple_sched.cfg /bgsys/drivers/ppcfloor/bin/htcpartition --boot --mode SMP --configfile "$RUN_JOBS_CONFIG_FILE" if [ $?!= 0 ]; then echo "Booting HTC partition failed." exit 1 fi trap free_partition EXIT /bgsys/opt/simple_sched/bin/run_simple_sched_jobs cmds.txt Example 2: Sample LoadLeveler JCF Modify the sample JCF to run your commands. The sections highlighted in bold should be changed to suit your application. The job_type must be bluegene so that LL will run the script on a FEN. The FEN that LoadLeveler chooses must have the submit_mux running. The script in the JCF uses htcpartition to boot the partition. Set the mode parameter to htcpartition to match the mode that your application will run in (DUAL, LINUX, SMP, or VN). The script uses run_simple_sched_jobs to run all the commands in cmds.txt. A shell trap is set up so that when the script exits, htcpartition will free the partition. 11

12 Service setup Administrators may want to have a single pool that users can submit HTC jobs to without going through another scheduler like LoadLeveler. In this case, follow the instructions in this section to set up the HTC Scheduler to run as a system service. Once the following steps are completed, end users can run qsub to submit jobs, qstat to see the status of their job, and qdel to remove a job from the queue. Note that if users are only using personal instances of the HTC Scheduler, service setup is not necessary. 1. Customize the configuration file Copy the configuration file /bgsys/opt/simple_sched/etc/simple_sched.cfg to /bgsys/local/etc/simple_sched.cfg. Edit /bgsys/local/etc/simple_sched.cfg and set the following options: Set scheduler_hostname to your SN hostname. For example, "mysn.mydomain" Set pool_name to the pool that you will be using for HTC jobs. For example, "R00-M0" Set pool_size to the number of nodes in the pool. For example, "1mpV" (= 1 midplane in VN mode) You will probably not have to change the other options. 2. Start the HTC Scheduler server daemon on the SN Create a symlink to the init script in /etc/init.d, install the daemon to start automatically at the default run levels, and start it manually: # ln -s /bgsys/opt/simple_sched/etc/init.d/ibm.com-simple_sched_server /etc/init.d/ # /usr/lib/lsb/install_initd -v ibm.com-simple_sched_server # /etc/init.d/ibm.com-simple_sched_server start 3. Start the HTC Scheduler startd daemons on the submit nodes The submit nodes are any system where the submit mux is running (i.e., FENs). At least one of these systems must be designated to run the startd daemon. The number of systems required depends on the size of the pool. The startd daemons will be starting the submit program supplied with the Blue Gene software, which must be able to load the HTC Scheduler's submit plug-in. The location of the submit plug-in, /bgsys/opt/simple_sched/lib, needs to be configured in the dynamic linker (ld.so). This is typically done by creating a text file in /etc/ld.so.conf.d and running ldconfig. Create a symlink to the startd daemon init script in /etc/init.d, install the daemon to start automatically at the default run levels, and start it manually: # ln -s /bgsys/opt/simple_sched/etc/init.d/ibm.com-simple_sched_startd /etc/init.d/ # /usr/lib/lsb/install_initd -v ibm.com-simple_sched_startd # /etc/init.d/ibm.com-simple_sched_startd start 4. Configure the end-user environment Change the system environment so that the end-user command line utilities (qsub, qstat, and 12 IBM Scheduler for HTC on IBM Blue Gene/P

13 qdel) are available. The users' PATH should include /bgsys/opt/simple_sched/bin. This is usually done by creating a script in /etc/profile.d. 13

14 Configuration Most configuration values can be set using: an option on the command line. For example, -scheduler-service an environment variable. For example, SIMPLE_SCHEDULER_SERVICE=12345 ; export SIMPLE_SCHEDULER_SERVICE a line in the configuration file. For example, scheduler_service_name=12345 If a configuration value can be set using multiple methods, the command line option takes precedence over the environment variable, which takes precedence over the config file. The HTC Scheduler programs use a configuration file. The file that's used is either (in order of preference): 1. specified on the command line using the -config parameter 2. specified using the SIMPLE_SCHED_CONFIG_FILE environment variable 3. if present, the current directory config file,./simple_sched.cfg 4. if present, the system config file, /bgsys/local/etc/simple_sched.cfg 5. if present, the install config file, /bgsys/opt/simple_sched/etc/simple_sched.cfg Typically, the administrator will have copied the configuration file from /bgsys/opt/simple_sched/etc/simple_sched.cfg to /bgsys/local/etc/simple_sched.cfg and changed any configuration options necessary for the local system. Configuration options This section describes the configuration options. Configuration file name The configuration file to use. If not specified, will use in this order: Format Environment variable Command-line 1. if present, the current directory config file,./simple_sched.cfg 2. if present, the system config file, /bgsys/local/etc/simple_sched.cfg 3. if present, the local config file: /bgsys/opt/simple_sched/etc/simple_sched.cfg File name, see open() SIMPLE_SCHED_CONFIG_FILE -config <filename> Scheduler service name Format The service name (port) that the server will listen on and the clients will attempt to contact. The default value is "simple_htc_scheduler". Service name, see getaddrinfo() Configuration file option scheduler_service_name Environment variable Command-line SIMPLE_SCHEDULER_SERVICE -scheduler-service <service-name> 14 IBM Scheduler for HTC on IBM Blue Gene/P

15 Scheduler host name The host name of the system that the server is running on. Format Host name, see getaddrinfo() Configuration file option scheduler_hostname Environment variable SIMPLE_SCHEDULER_HOSTNAME Command-line -scheduler-hostname <hostname> Pool name Format Configuration file option pool_name Environment variable Command-line The name of the pool to run HTC jobs on. s to default_pool. Pool name, see the Blue Gene System Administration Redbook (SG SIMPLE_SCHEDULER_POOL -pool <pool-name> Pool size Describes the partitions in the pool. For each partition in the pool the scheduler must be told its size and mode using this format: [<count>][<type>][<mode>] where at least one of these must be present, and count is a number (default is 1) type is a hardware type, n =node, nc =node card, mp =midplane, r =rack (default is n ) mode is the mode of the partition, D =dual, L =Linux, S =SMP, V =virtual node (default is S ) Separate partition descriptions using space. Example partition: 1ncS = 1 node card in SMP mode (32 nodes to run on) Example pool: 1mpS 1ncV = 1 midplane booted in SMP mode and 1 node card booted in virtual node mode. Format Pool size, see description Configuration file option pool_size Environment variable SIMPLE_SCHEDULER_POOL_SIZE Command-line -pool-size <pool-size> Submit path 15

16 Format Configuration file option submit_path Environment variable Command-line The path to the submit program. s to /bgsys/drivers/ppcfloor/bin/submit. Executable name, see exec() SIMPLE_SCHEDULER_SUBMIT_PATH -submit-path <filename> Submit Options Additional options to set when calling submit. The startd daemon will put these options on the submit command when it executes submit in addition to the arguments it uses. The default is empty. If the options aren't valid then submitted jobs will fail. Format Command-line options, like "-trace 0" Configuration file option submit_args Environment variable Command-line SIMPLE_SCHEDULER_SUBMIT_ARGS -submit-args <arguments> simple_sched daemon PID file Format The path to use for simple_sched's PID file. s to /var/run/simple_sched.pid. File name Configuration file option startd_pid_file Environment variable Command-line SIMPLE_SCHEDULER_PID_FILE -pid-file <filename> startd daemon PID file The path to use for startd's PID file. s to /var/run/startd.pid. Format File name Configuration file option startd_pid_file Environment variable SIMPLE_SCHEDULER_STARTD_PID_FILE Command-line -pid-file <filename> Verbose The verbose level for log output. If not present, then no logging will be done. If present with no value, the level is "notice". The levels 16 IBM Scheduler for HTC on IBM Blue Gene/P

17 available are, from most to least selective: debug, D, 7 info, I, 6 notice, N, 5 warning, W, 4 err, E, 3 crit, 2 (not used) alert, 1 (not used) emerg, 0 (not used) Format Command-line Verbose level, see description -verbose[=<level>] 17

18 Daemons The following section provides details on the daemons that implement the HTC Scheduler. simple_sched daemon This section provides details on the simple_sched daemon. Command-line options In addition to the command-line options to override configuration options, the following options are available when starting the simple_sched daemon. -accept-sd=sd The simple_sched daemon accepts connections on the supplied socket descriptor. The default is to open a socket to accept client connections on. The socket descriptor must be an integer. -log-to-stdout The simple_sched daemon will log to stdout. The default is to log using the syslog() API. -suspend The simple_sched daemon will start in suspended state. It will not assign jobs to startd daemons until resumed. The default is to start in running state. To resume, use qcmd resume -pick-port The simple_sched daemon will pick an ephemeral port to use. The default is to use the configured service name. -pid-file-required[=optional required skip] Tells the simple_sched daemon how to handle the PID file. The default is optional if the option is not used, or required if the option is used. Allowed values are: optional - Try to create the pid file, but if can't, continue required - Create the pid file and fail if cannot. skip Do not create the pid file -boot 18 IBM Scheduler for HTC on IBM Blue Gene/P

19 simple_sched will execute /bgsys/drivers/ppcfloor/bin/htcpartition to boot the partition. htcpartition must be able to get its boot parameters from the mpirun plugin. Shutting down The simple_sched daemon can be shut down in one of three ways: Very Slow - No more jobs will be accepted. simple_sched will wait until the submit queue is empty and all outstanding submits are complete. Trigger this by signaling with SIGINT (CTRL-C). Slow - No more jobs will be accepted and the submit queue will be cleared. simple_sched will wait until all outstanding jobs are complete. Trigger this by signaling with SIGQUIT (CTRL-\). Quick - Just exits, not waiting for jobs to complete. Trigger this by signaling with SIGTERM. startd daemon This section provides details on the startd daemon. Command-line options In addition to the command-line options to override configuration options, the following options are available when starting the simple_sched daemon. -log-to-stdout The simple_sched daemon will log to stdout. The default is to log using the syslog() API. -pid-file-required[=optional required skip] Tells the simple_sched daemon how to handle the PID file. The default is optional if the option is not used, or required if the option is used. Allowed values are: optional - Try to create the pid file, but if can't, continue required - Create the pid file and fail if cannot skip Do not create the pid file Submit plug-in The startd daemon uses the submit plug-in to get information about the job back from the submit program. The submit program will call functions in the submit plug-in when a job ends. If the job failed, the data provided on the function call will include the reason for the failure. The submit command uses dlopen to load the submit plug-in, so the shared library containing the submit plug-in must be configured in the dynamic linker. Configuring the submit plug-in shared library in the dynamic linker can be done in several ways, including use of LD_LIBRARY_PATH, and editing the ld.so.conf. The HTC Scheduler's submit plug-in is located in /bgsys/opt/simple_sched/lib/libsubmit_if.so. (Note that run_simple_sched_jobs sets the LD_LIBRARY_PATH, so if the HTC Scheduler is run only through run_simple_sched_jobs then no extra configuration is required.) The submit plug-in provided by the HTC Scheduler also prevents other users from submitting jobs to 19

20 the pool it's configured to use. It does this by reading the local HTC Scheduler configuration file, /bgsys/local/etc/simple_sched.cfg, and if the pool entered on the command line is not set, or it's the same pool as is in the local configuration file, then it returns non-zero and submit will fail. Shutting down The startd daemon can be shut down in one of three ways: 1. Very slow startd tells simple_sched to stop sending work; then waits until all submits have finished. Trigger this by sending SIGINT (CTRL-C). 2. Slow startd tells simple_sched to stop sending work; all current submits will get SIGTERM and should end quickly; then waits until all the submits have finished. Trigger this by sending SIGQUIT (CTRL-\). 3. Quick Exits without sending results, trigger this by signaling with SIGTERM. 20 IBM Scheduler for HTC on IBM Blue Gene/P

21 End-user commands qcmd qcmd can be used to send commands to the HTC Scheduler. If qcmd can't connect to the simple_sched daemon it will exit with an error message and non-zero exit status. Commands Listed here are the commands accepted by qcmd. Following the list of commands is a description of the response types. submit [OPTION]... COMMAND... Response Options Submit a job to run. Submit ID if -wait, or job status if no -wait -mode=mode The mode that the job requires. Parameter -restartable -cwd=directory A mode, one of DUAL, LINUX, SMP, or VN. The job can run in any mode. The server will check for an available HTC resource in this order: VN, DUAL, SMP, LINUX. Indicates that the job can be restarted if it fails. The job cannot be restarted. The working directory. Parameter -exp_env=name A directory name. The current working directory. Export an environment variable to the job. This can be used multiple times to export multiple variables. Parameter -env_all An environment variable name The environment variable is not exported to the job 21

22 Export all environment variables to the job No environment variables are exported -env=name=value[ NAME=VALUE] Define environment variables for the job. This can be used multiple times to define multiple environment variables. Parameter -name=name A space-separated list of NAME=VALUE pairs The name for the job. This is used as the base name for the output files. Parameter The name can be any value that can be used in a file name submit -stdin-file=file The file from which to read standard input. Parameter A file name. If the file name is not a full path then the file is opened from <cwd>. This file must be readable when the program runs. /dev/null -stdout-file=file The file to which to write standard output. If this option is specified, the file will not be removed even if it is empty. Parameter -stderr-file=file A file name. If the file name is not a full path then the file is opened from <cwd>. <name>-<submit-id>.out The file to which to write standard error. If this option is specified, the file will not be removed even if it is empty. Parameter -wait A file name. If the file name is not a full path then the file is opened from <cwd>. <name>-<submit-id>.err Wait for results. 22 IBM Scheduler for HTC on IBM Blue Gene/P

23 Do not wait for results. status [-wait] <submit-id> all Display the state of a job or all jobs if all is specified. Response job status Option -wait Wait for results. Do not wait for results. cancel <submit-id> Cancel a submitted job. Response job status scheduler_status Display the scheduler status. Response scheduler status suspend Response The simple_sched daemon will stop assigning jobs until resume. scheduler status resume Response The simple_sched daemon will resume assigning jobs. scheduler status help [<command>] Display the command help summary or detailed help for the specified command Immediate mode If a command is supplied on the qcmd command-line, it will execute that command and exit. This can be seen in the following examples: $ qcmd scheduler_status 23

24 [running (submit queue=0) (submits=assigned:0 completed:5 notzero:0 error:0 canceled:0) (htc resources=smp:32/32 dual:0/0 vn:0/0 linux:32/32)] $ qcmd submit test.cna Submit id: 6 Interactive mode qcmd will operate in interactive mode when a command is not supplied on the command line. In this mode qcmd reads commands from stdin. Commands are processed asynchronously: a command creates a request that is sent to the simple_sched daemon, the next request can be sent before the previous command completes, and the responses to the requests may be received out of order. When qcmd generates a request from a command, a unique request ID is generated, and the request ID and the command name are displayed, for example: $ cat cmds.txt suspend submit test $ cat cmds.txt qcmd 1 <- suspend 2 <- submit When qcmd receives a response from the simple_sched daemon, it prints out the request ID and the response info. Continuing the previous example, the two requests received these responses: 1 - [running (submit queue=0) (submits=assigned:0 completed:5 notzero:0 error:0 canceled:0) (htc resources=smp:32/32 dual:0/0 vn:0/0 linux:32/32)] 2 - Submit id: 7 When the response is not the last response for a request, qcmd will display "->" after the request ID, whereas if it is the last response for a request, qcmd will display "- " after the request ID. When stdin is closed, qcmd will continue receiving responses from the simple_sched daemon until the simple_sched daemon indicates that there are no more outstanding requests. Users can take advantage of this behavior to submit several jobs by writing "submit -wait" commands to qcmd, closing stdin, and then waiting for qcmd to exit, at which point all the submitted jobs have completed. The run_simple_sched_jobs command uses this feature. Response format The format for the response info depends on the type of the response. Submit ID response If you do submit without -wait, qcmd will print out the response like "Submit ID: <submit-id>". For example, $ qcmd submit test.cna 1 <- submit 1 - Submit ID: 1 24 IBM Scheduler for HTC on IBM Blue Gene/P

25 Submit status response If you do a submit -wait or status command, qcmd will print out the current state and, if the state is COMPLETED, the exit status, and, if set, the error message. The format is "<submit-id> is <status>[ exit status <exit-status>][ error message '<error-msg>']". $ qcmd status 1 1 <- status is COMPLETED exit status 0 $ qcmd submit -wait test 1 <- submit [wait] 1 -> 2 is QUEUED 1 -> 2 is ASSIGNED 1-2 is COMPLETED exit status 0 Scheduler status response If the response is a scheduler status, qcmd will print the response using the following format: [<submit_thread_status>[ booting] <submit_queue_status> <submits_status> <resource_pool_status>] where submit_thread_status is either "running" or "suspended" booting will be displayed if the server is waiting for htcpartition to finish booting the partition submit_queue_status is "(submit queue=<count>)". This is the number of jobs in the submit queue submits_status is "(submits=assigned:<count> completed:<count> notzero:<count> error:<count> canceled:<count>)". This shows the number of jobs that are currently assigned, that have completed, that ended with non-zero exit status, that did not run due to an error, and that were canceled resource_pool_status is "(htc resources=smp:<avail>/<total> dual:<avail>/<total> vn:<avail>/<total> linux:<avail>/<total>)" $ qcmd scheduler_status [running (submit queue=0) (submits=assigned:0 completed:5 notzero:0 error:0 canceled:0) (htc resources=smp:32/32 dual:0/0 vn:0/0 linux:32/32)] Request rejected response The HTC Scheduler may reject a request. One example of when a request would be rejected is the server is shutting down and a new job is submitted. The format for this is "Request failed '<errormsg>'". $ qcmd submit test.cna 1 <- submit 1 - Request failed 'submit rejected because shutting down' 25

26 qsub qsub is simply a symbolic link to qcmd. When qcmd is called and the program name is qsub, it performs a submit command. Refer to the documentation on qcmd's submit command for the parameters to qsub. Following is sample output of qsub: $ qsub test.cna Submit id: 2 $ qsub -wait test.cna 3 is QUEUED 3 is ASSIGNED 3 is ASSIGNED [location='r00-m0-n14-j10-c00' jobid=12345 partition='r00-m0-n14'] 3 is COMPLETED exit status 0 [location='r00-m0-n14-j10-c00' jobid=12345 partition='r00-m0-n14'] $ qsub -stdout-file=/dev/null -stderr-file=/dev/null -cwd /bgusr/myhome -env=htc=true -- /bgusr/myhome/test.cna -opt1=opt1value Submit id: 4 qstat qstat is used to get the status of a submitted job. qstat is simply a symbolic link to qcmd. When qcmd is called and the program name is qstat, it performs a status command. Refer to the documentation on qcmd's status command for the parameters to qstat. Following is sample output of qstat: $ qstat 4 4 is COMPLETED exit status 0 [location='r00-m0-n14-j10-c00' jobid=12345 partition='r00-m0-n14'] $ qstat 7 Status for 7 is not available. $ qstat -wait 9 6 is ASSIGNED [location='r00-m0-n14-j19-c00' jobid=12345 partition='r00-m0-n14'] 6 is COMPLETED exit status 0 [location='r00-m0-n14-j19-c00' jobid=12345 partition='r00-m0-n14'] $ qstat -wait all 7 is QUEUED 7 is ASSIGNED 7 is ASSIGNED [location='r00-m0-n14-j59-c00' jobid=12345 partition='r00-m0-n14'] 7 is COMPLETED exit status 0 [location='r00-m0-n14-j19-c00' jobid=12345 partition='r00-m0-n14'] 8 is QUEUED 8 is ASSIGNED 8 is ASSIGNED [location='r00-m0-n14-j44-c00' jobid=12345 partition='r00-m0-n14'] 8 is COMPLETED exit status 0 [location='r00-m0-n14-j20-c00' jobid=12345 partition='r00-m0-n14'] qdel qdel is used to cancel a submitted job. qdel is simply a symbolic link to qcmd. When qcmd is called and the program name is qdel, it performs a cancel command. Refer to the documentation on qcmd's cancel command for the parameters to qdel. 26 IBM Scheduler for HTC on IBM Blue Gene/P

27 Following is sample output of qdel: $ qdel is CANCELED $ qdel is CANCELING [location='r00-m0-n14-j23-c00' jobid=12345 partition='r00-m0-n14'] $ qdel is COMPLETED term signal 9 [location='r00-m0-n14-j10-c00' jobid=12345 partition='r00-m0-n14'] $ qdel is COMPLETED exit status 0 [location='r00-m0-n14-j10-c00' jobid=12345 partition='r00-m0-n14'] In the previous examples, submit ID 10 was queued when it was canceled; 11 was ASSIGNED when it was canceled the first time, and completed the second time; 12 had already exited when it was canceled. Submitted job states The states that a job can be in are as follows: QUEUED - The job is in the queue and will run when an HTC resource and startd are available. An error message may be available if the job failed and was requeued. When a job in this state is canceled, it goes to CANCELED state. ASSIGNED - The job is assigned to a startd to run. Information supplied by the submit program may be available (for example, the Blue Gene job ID and location). When a job in this state is canceled, it goes to CANCELING state. COMPLETED - The job has completed normally and has an exit status. CANCELING - The job has been canceled and the startd has been told to kill it. When it exits, it should go to COMPLETED state with the exit status indicating it was killed with a signal. CANCELED - The job was canceled without running. ERROR - The HTC Scheduler wasn't able to run this job and may have an error message. Figure 3 illustrates the possible transitions between states. QUEUED ASSIGNED COMPLETED CANCELING ERROR Figure 3: Submitted job states CANCELED 27

28 Note about when state info is available When a job's state is reported to a client and the job state was an end state (COMPLETED, CANCELED, or ERROR; the right-most states in Figure 3), knowledge of the submitted job will be cleared from the server. Any further request for state using the submit ID will get a response of 'not found'. 28 IBM Scheduler for HTC on IBM Blue Gene/P

29 run_simple_sched_jobs run_simple_sched_jobs starts a personal instance of the HTC Scheduler and runs commands through it. To do this, it opens up an ephemeral port (i.e., Linux picks one that's not in use) and creates a HTC Scheduler configuration file which is based on the base HTC Scheduler configuration with these options replaced: scheduler_hostname -- is set to the current system's host name scheduler_service_name -- is set to the ephemeral port number pool_name -- is set to the configured pool name pool_size -- is set to the configured pool size Note that if the --reuse-config option is specified, then only the scheduler_service_name is changed and the base configuration is not used. Next, run_simple_sched_jobs forks and execs a simple_sched process with options to use the new config file and accept connections on the socket descriptor that run_simple_sched_jobs opened to listen on the ephemeral port. Then run_simple_sched_jobs forks and execs a startd process which uses the new config file. If there are any command files specified on the command line, run_simple_sched_jobs forks and execs a qcmd process using the new config file, to which run_simple_sched_jobs writes a "submit -wait" command for each command. Output from qcmd is parsed to look for completion messages which are echoed to stdout. Once all the command files have been processed, and the run_simple_sched_jobs keep running option is not set, it will next signal the child processes to quit. It will then wait for the simple_sched and startd child processes to exit. Tip: Use the --configfile option when booting the partition with htcpartition, then pass that same config file as the -config parameter to run_simple_sched_jobs which will read the pool name and pool size configuration options from this file. Command files run_simple_sched_jobs reads command files. Each line of a command file contains options to qcmd's submit command, an end of options indicator, the program to run, and the arguments to the command. A line in the command file can contain arguments for qcmd's submit command. Refer to qcmd's submit command options on page 21 for the options to the submit command. After the options to qcmd's submit command in the line, the user should put an end of options indicator, --. If the end of options indicator is left off, any arguments to the program that start with - will be interpreted as options for the submit command. The following command file line will start the location.cna program with the -print argument; any output will be discarded because the -stdout-file and -stderr-file options are specified for qcmd's submit command: -stdout-file=/dev/null -stderr-file=/dev/null -cwd /bgusr/myhome -- location.cna -print Each command line is converted into a qcmd submit command that is sent to the qcmd subprocess. qcmd will generate a request ID for the submitted job. Since request IDs are assigned starting at 1, the ID will match the line number. 29

30 Output run_simple_sched_jobs prints out a line whenever it's notified that a submitted command has completed, or that a submitted command was rejected (this should be rare). Completion lines look like: 1-1 is COMPLETED exit status 0 [location='r00-m0-n14-j10-c00' jobid=12345 partition='r00-m0-n14'] This is the standard completion line from the qcmd program. The first number is the request ID (command number), which starts at 1 and is incremented for each command submitted. The second number is the submit ID that the simple_sched daemon assigned to the job. When run_simple_sched_jobs exits it prints out a line summarizing the results, like this: Submitted 128 jobs, 128 completed, 60 had non-zero exit status, 0 requests failed. Signal handling If run_simple_sched_jobs gets SIGINT (CTRL-C) it goes into slow shutdown mode. No new commands will be accepted, and all submitted jobs will complete. If run_simple_sched_jobs gets SIGQUIT (CTRL-\) or SIGTERM (kill) it goes into fast shutdown mode. No new commands will be accepted and all submitted jobs will be canceled. Positional parameters The positional parameters are the names of command files (see the command files section). If the name is - then run_simple_sched_jobs will read commands from stdin until it reads an end-of-file. If there are no positional parameters, then no commands will be run, which is only useful when using the -keeprunning option. LoadLeveler integration run_simple_sched_jobs looks for environment variables set by LoadLeveler and changes its behavior when these environment variables are set. The environment variables set by LoadLeveler are LOADL_BG_PARTITION, LOADL_BG_SIZE, and LOADL_BG_PARTITION_TYPE. When these environment variables are set, run_simple_sched_jobs will create a temporary configuration file containing the pool name and pool size set from these values. It will also automatically pass -boot to simple_sched. Configuration There are several configuration options. When an option can be specified in multiple ways, a command-line option takes precedence over an environment variable. 30 IBM Scheduler for HTC on IBM Blue Gene/P

31 Keep running Command-line option run_simple_sched_jobs will continue running after all command files have been processed. This can be used to submit jobs to the partition using qsub. Simply specify the same config file when invoking qsub. -keep-running Exit after all jobs have completed. Suspend Command-line option run_simple_sched_jobs will start the simple_sched daemon suspended. Use the "qcmd resume" command to cause the simple_sched daemon to start assigning jobs. -suspend The simple_sched daemon will be started in running mode. Boot Command-line option run_simple_sched_jobs will pass -boot to the simple_sched process. -boot run_simple_sched_jobs will not pass -boot to the simple_sched process. Configuration file Command-line option run_simple_sched_jobs will first attempt to read this file as an htcpartition output configuration file to get the partition information; then it will create or overwrite this file with the new HTC Scheduler configuration. -config <filename> Environment variable RUN_JOBS_CONFIG_FILE Use mkstemp() to create a temporary file whose name is like "my_simple_sched.cfg.xxxxxx". This file will be deleted when run_simple_sched_jobs exits. Re-use configuration file Command-line option Tells run_simple_sched_jobs to re-use the configuration file from an earlier run rather than create a brand new one. This would be useful if calling run_simple_sched_jobs again for a single HTC boot. -reuse-config The configuration file will not be reused. 31

32 Base configuration file Command-line option The configuration file to use for the base configuration. Your personal HTC Scheduler instance will use several of the options from this configuration file, for example, submit_path and submit_args. -base-config-file=<filename> Environment variable SIMPLE_SCHED_CONFIG_FILE (see the HTC Scheduler configuration section above) Search for the configuration file as described in the Configuration section starting on page 14. Pool name Command-line option The pool that the HTC Scheduler will use. -pool-name=<pool-name> Environment variable RUN_JOBS_POOL_NAME If re-use config, gets from configuration file Otherwise, if the configuration file was created by htcpartition, gets from the configuration file Otherwise, there's no default and this must be specified Pool size Command-line option The size of the pool that the HTC Scheduler will use, see the HTC Scheduler configuration for a description of the format (it also specifies the mode). -pool-size=<pool-size> Environment variable RUN_JOBS_POOL_SIZE If re-use config, gets from configuration file Otherwise, if the configuration file was created by htcpartition, gets from the configuration file Otherwise there's no default and this must be specified simple_sched daemon executable Command-line option The program that will be executed for the HTC Scheduler process. You will probably not have to change this. -simple-sched-exe=<executable> Environment variable RUN_JOBS_SIMPLE_SCHED_EXE /bgsys/opt/simple_sched/sbin/simple_sched 32 IBM Scheduler for HTC on IBM Blue Gene/P

33 startd daemon executable Command-line option The program that will be executed for the startd process. You will probably not have to change this. -startd-exe=<executable> Environment variable RUN_JOBS_STARTD_EXE /bgsys/opt/simple_sched/sbin/startd Qcmd executable Command-line option The program that will be executed for the qcmd process. You will probably not have to change this. -qcmd-exe=<executable> Environment variable RUN_JOBS_QCMD_EXE /bgsys/opt/simple_sched/bin/qcmd Verboseness Command-line option Set the verboseness of run_simple_sched_jobs by setting the -verbose parameter. The verboseness for the child processes can be overridden using the following command-line options: -verbose-qcmd -- The qcmd process -verbose-simple-sched -- The simple_sched process -verbose-startd -- The startd process See the verbose section in the Configuration section starting on page 14 for the allowed values. -verbose[=<verbose-level>] The default for run_simple_sched_jobs is that no log messages will be displayed. For the child processes, the default log level is Warning. 33

Content. MPIRUN Command Environment Variables LoadLeveler SUBMIT Command IBM Simple Scheduler. IBM PSSC Montpellier Customer Center

Content. MPIRUN Command Environment Variables LoadLeveler SUBMIT Command IBM Simple Scheduler. IBM PSSC Montpellier Customer Center Content IBM PSSC Montpellier Customer Center MPIRUN Command Environment Variables LoadLeveler SUBMIT Command IBM Simple Scheduler Control System Service Node (SN) An IBM system-p 64-bit system Control

More information

Blue Gene/Q User Workshop. User Environment & Job submission

Blue Gene/Q User Workshop. User Environment & Job submission Blue Gene/Q User Workshop User Environment & Job submission Topics Blue Joule User Environment Loadleveler Task Placement & BG/Q Personality 2 Blue Joule User Accounts Home directories organised on a project

More information

Programs. Program: Set of commands stored in a file Stored on disk Starting a program creates a process static Process: Program loaded in RAM dynamic

Programs. Program: Set of commands stored in a file Stored on disk Starting a program creates a process static Process: Program loaded in RAM dynamic Programs Program: Set of commands stored in a file Stored on disk Starting a program creates a process static Process: Program loaded in RAM dynamic Types of Processes 1. User process: Process started

More information

Introduction to Operating Systems Prof. Chester Rebeiro Department of Computer Science and Engineering Indian Institute of Technology, Madras

Introduction to Operating Systems Prof. Chester Rebeiro Department of Computer Science and Engineering Indian Institute of Technology, Madras Introduction to Operating Systems Prof. Chester Rebeiro Department of Computer Science and Engineering Indian Institute of Technology, Madras Week 03 Lecture 12 Create, Execute, and Exit from a Process

More information

HW 1: Shell. Contents CS 162. Due: September 18, Getting started 2. 2 Add support for cd and pwd 2. 3 Program execution 2. 4 Path resolution 3

HW 1: Shell. Contents CS 162. Due: September 18, Getting started 2. 2 Add support for cd and pwd 2. 3 Program execution 2. 4 Path resolution 3 CS 162 Due: September 18, 2017 Contents 1 Getting started 2 2 Add support for cd and pwd 2 3 Program execution 2 4 Path resolution 3 5 Input/Output Redirection 3 6 Signal Handling and Terminal Control

More information

OpenPBS Users Manual

OpenPBS Users Manual How to Write a PBS Batch Script OpenPBS Users Manual PBS scripts are rather simple. An MPI example for user your-user-name: Example: MPI Code PBS -N a_name_for_my_parallel_job PBS -l nodes=7,walltime=1:00:00

More information

User Guide of High Performance Computing Cluster in School of Physics

User Guide of High Performance Computing Cluster in School of Physics User Guide of High Performance Computing Cluster in School of Physics Prepared by Sue Yang (xue.yang@sydney.edu.au) This document aims at helping users to quickly log into the cluster, set up the software

More information

Unix Processes. What is a Process?

Unix Processes. What is a Process? Unix Processes Process -- program in execution shell spawns a process for each command and terminates it when the command completes Many processes all multiplexed to a single processor (or a small number

More information

Most of the work is done in the context of the process rather than handled separately by the kernel

Most of the work is done in the context of the process rather than handled separately by the kernel Process Control Process Abstraction for a running program Manages program s use of memory, cpu time, and i/o resources Most of the work is done in the context of the process rather than handled separately

More information

Creating a Shell or Command Interperter Program CSCI411 Lab

Creating a Shell or Command Interperter Program CSCI411 Lab Creating a Shell or Command Interperter Program CSCI411 Lab Adapted from Linux Kernel Projects by Gary Nutt and Operating Systems by Tannenbaum Exercise Goal: You will learn how to write a LINUX shell

More information

Batch Systems. Running calculations on HPC resources

Batch Systems. Running calculations on HPC resources Batch Systems Running calculations on HPC resources Outline What is a batch system? How do I interact with the batch system Job submission scripts Interactive jobs Common batch systems Converting between

More information

Answers to Federal Reserve Questions. Training for University of Richmond

Answers to Federal Reserve Questions. Training for University of Richmond Answers to Federal Reserve Questions Training for University of Richmond 2 Agenda Cluster Overview Software Modules PBS/Torque Ganglia ACT Utils 3 Cluster overview Systems switch ipmi switch 1x head node

More information

bash, part 3 Chris GauthierDickey

bash, part 3 Chris GauthierDickey bash, part 3 Chris GauthierDickey More redirection As you know, by default we have 3 standard streams: input, output, error How do we redirect more than one stream? This requires an introduction to file

More information

Q) Q) What is Linux and why is it so popular? Answer - Linux is an operating system that uses UNIX like Operating system...

Q) Q) What is Linux and why is it so popular? Answer - Linux is an operating system that uses UNIX like Operating system... Q) Q) What is Linux and why is it so popular? Answer - Linux is an operating system that uses UNIX like Operating system... Q) Q) What is the difference between home directory and working directory? Answer

More information

bash Args, Signals, Functions Administrative Shell Scripting COMP2101 Fall 2018

bash Args, Signals, Functions Administrative Shell Scripting COMP2101 Fall 2018 bash Args, Signals, Functions Administrative Shell Scripting COMP2101 Fall 2018 Error Output Failed commands often generate unwanted or irrelevant error messages That output can be saved as a log, sent

More information

bash Args, Signals, Functions Administrative Shell Scripting COMP2101 Fall 2017

bash Args, Signals, Functions Administrative Shell Scripting COMP2101 Fall 2017 bash Args, Signals, Functions Administrative Shell Scripting COMP2101 Fall 2017 Positional Arguments It is quite common to allow the user of a script to specify what the script is to operate on (e.g. a

More information

Shell Scripting. With Applications to HPC. Edmund Sumbar Copyright 2007 University of Alberta. All rights reserved

Shell Scripting. With Applications to HPC. Edmund Sumbar Copyright 2007 University of Alberta. All rights reserved AICT High Performance Computing Workshop With Applications to HPC Edmund Sumbar research.support@ualberta.ca Copyright 2007 University of Alberta. All rights reserved High performance computing environment

More information

The Linux IPL Procedure

The Linux IPL Procedure The Linux IPL Procedure SHARE - Tampa February 13, 2007 Session 9274 Edmund MacKenty Rocket Software, Inc. Purpose De-mystify the Linux boot sequence Explain what happens each step of the way Describe

More information

DEBUGGING ON FERMI PREPARING A DEBUGGABLE APPLICATION GDB. GDB on front-end nodes

DEBUGGING ON FERMI PREPARING A DEBUGGABLE APPLICATION GDB. GDB on front-end nodes DEBUGGING ON FERMI Debugging your application on a system based on a BG/Q architecture like FERMI could be an hard task due to the following problems: the core files generated by a crashing job on FERMI

More information

Linux shell scripting Getting started *

Linux shell scripting Getting started * Linux shell scripting Getting started * David Morgan *based on chapter by the same name in Classic Shell Scripting by Robbins and Beebe What s s a script? text file containing commands executed as a unit

More information

CSC209: Software tools. Unix files and directories permissions utilities/commands Shell programming quoting wild cards files

CSC209: Software tools. Unix files and directories permissions utilities/commands Shell programming quoting wild cards files CSC209 Review CSC209: Software tools Unix files and directories permissions utilities/commands Shell programming quoting wild cards files ... and systems programming C basic syntax functions arrays structs

More information

CSC209: Software tools. Unix files and directories permissions utilities/commands Shell programming quoting wild cards files. Compiler vs.

CSC209: Software tools. Unix files and directories permissions utilities/commands Shell programming quoting wild cards files. Compiler vs. CSC209 Review CSC209: Software tools Unix files and directories permissions utilities/commands Shell programming quoting wild cards files... and systems programming C basic syntax functions arrays structs

More information

Reducing Cluster Compatibility Mode (CCM) Complexity

Reducing Cluster Compatibility Mode (CCM) Complexity Reducing Cluster Compatibility Mode (CCM) Complexity Marlys Kohnke Cray Inc. St. Paul, MN USA kohnke@cray.com Abstract Cluster Compatibility Mode (CCM) provides a suitable environment for running out of

More information

Grid Engine Users Guide. 5.5 Edition

Grid Engine Users Guide. 5.5 Edition Grid Engine Users Guide 5.5 Edition Grid Engine Users Guide : 5.5 Edition Published May 08 2012 Copyright 2012 University of California and Scalable Systems This document is subject to the Rocks License

More information

CSC209 Review. Yeah! We made it!

CSC209 Review. Yeah! We made it! CSC209 Review Yeah! We made it! 1 CSC209: Software tools Unix files and directories permissions utilities/commands Shell programming quoting wild cards files 2 ... and C programming... C basic syntax functions

More information

Advanced cluster techniques with LoadLeveler

Advanced cluster techniques with LoadLeveler Advanced cluster techniques with LoadLeveler How to get your jobs to the top of the queue Ciaron Linstead 10th May 2012 Outline 1 Introduction 2 LoadLeveler recap 3 CPUs 4 Memory 5 Factors affecting job

More information

Introduction to HPC Using zcluster at GACRC

Introduction to HPC Using zcluster at GACRC Introduction to HPC Using zcluster at GACRC On-class STAT8330 Georgia Advanced Computing Resource Center University of Georgia Suchitra Pakala pakala@uga.edu Slides courtesy: Zhoufei Hou 1 Outline What

More information

Bamuengine.com. Chapter 7. The Process

Bamuengine.com. Chapter 7. The Process Chapter 7. The Process Introduction A process is an OS abstraction that enables us to look at files and programs as their time image. This chapter discusses processes, the mechanism of creating a process,

More information

Grid Compute Resources and Job Management

Grid Compute Resources and Job Management Grid Compute Resources and Job Management How do we access the grid? Command line with tools that you'll use Specialised applications Ex: Write a program to process images that sends data to run on the

More information

Batch Systems. Running your jobs on an HPC machine

Batch Systems. Running your jobs on an HPC machine Batch Systems Running your jobs on an HPC machine Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us

More information

SGE Roll: Users Guide. Version Edition

SGE Roll: Users Guide. Version Edition SGE Roll: Users Guide Version 4.2.1 Edition SGE Roll: Users Guide : Version 4.2.1 Edition Published Sep 2006 Copyright 2006 University of California and Scalable Systems This document is subject to the

More information

* What are the different states for a task in an OS?

* What are the different states for a task in an OS? * Kernel, Services, Libraries, Application: define the 4 terms, and their roles. The kernel is a computer program that manages input/output requests from software, and translates them into data processing

More information

Performer to DP2 Hot Folder Reference Manual Rev There is only one file involved with installing the Performer to DP2 Hot Folder.

Performer to DP2 Hot Folder Reference Manual Rev There is only one file involved with installing the Performer to DP2 Hot Folder. Performer to DP2 Hot Folder Reference Manual Rev. 07.11.05 Install Files: There is only one file involved with installing the Performer to DP2 Hot Folder. The installer file is named PP2DP2_1.x.x.EXE.

More information

Using ISMLL Cluster. Tutorial Lec 5. Mohsan Jameel, Information Systems and Machine Learning Lab, University of Hildesheim

Using ISMLL Cluster. Tutorial Lec 5. Mohsan Jameel, Information Systems and Machine Learning Lab, University of Hildesheim Using ISMLL Cluster Tutorial Lec 5 1 Agenda Hardware Useful command Submitting job 2 Computing Cluster http://www.admin-magazine.com/hpc/articles/building-an-hpc-cluster Any problem or query regarding

More information

Processes in linux. What s s a process? process? A dynamically executing instance of a program. David Morgan. David Morgan

Processes in linux. What s s a process? process? A dynamically executing instance of a program. David Morgan. David Morgan Processes in linux David Morgan What s s a process? process? A dynamically executing instance of a program 1 Constituents of a process its code data various attributes OS needs to manage it OS keeps track

More information

Contents: 1 Basic socket interfaces 3. 2 Servers 7. 3 Launching and Controlling Processes 9. 4 Daemonizing Command Line Programs 11

Contents: 1 Basic socket interfaces 3. 2 Servers 7. 3 Launching and Controlling Processes 9. 4 Daemonizing Command Line Programs 11 nclib Documentation Release 0.7.0 rhelmot Apr 19, 2018 Contents: 1 Basic socket interfaces 3 2 Servers 7 3 Launching and Controlling Processes 9 4 Daemonizing Command Line Programs 11 5 Indices and tables

More information

elinks, mail processes nice ps, pstree, top job control, jobs, fg, bg signals, kill, killall crontab, anacron, at

elinks, mail processes nice ps, pstree, top job control, jobs, fg, bg signals, kill, killall crontab, anacron, at Processes 1 elinks, mail processes nice ps, pstree, top job control, jobs, fg, bg signals, kill, killall crontab, anacron, at 2 elinks is a text-based (character mode) web browser we will use it to enable

More information

Introduction to HPC Numerical libraries on FERMI and PLX

Introduction to HPC Numerical libraries on FERMI and PLX Introduction to HPC Numerical libraries on FERMI and PLX HPC Numerical Libraries 11-12-13 March 2013 a.marani@cineca.it WELCOME!! The goal of this course is to show you how to get advantage of some of

More information

Introduction to GALILEO

Introduction to GALILEO Introduction to GALILEO Parallel & production environment Mirko Cestari m.cestari@cineca.it Alessandro Marani a.marani@cineca.it Domenico Guida d.guida@cineca.it Maurizio Cremonesi m.cremonesi@cineca.it

More information

Batch Systems & Parallel Application Launchers Running your jobs on an HPC machine

Batch Systems & Parallel Application Launchers Running your jobs on an HPC machine Batch Systems & Parallel Application Launchers Running your jobs on an HPC machine Partners Funding Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike

More information

Process Management forks, bombs, zombies, and daemons! Lecture 5, Hands-On Unix System Administration DeCal

Process Management forks, bombs, zombies, and daemons! Lecture 5, Hands-On Unix System Administration DeCal Process Management forks, bombs, zombies, and daemons! Lecture 5, Hands-On Unix System Administration DeCal 2012-10-01 what is a process? an abstraction! you can think of it as a program in the midst of

More information

EEM Action Tcl Command Extension

EEM Action Tcl Command Extension The following conventions are used for the syntax documented on the Tcl command extension pages: An optional argument is shown within square brackets, for example: [type?] A question mark? represents a

More information

Chapter 1 - Introduction. September 8, 2016

Chapter 1 - Introduction. September 8, 2016 Chapter 1 - Introduction September 8, 2016 Introduction Overview of Linux/Unix Shells Commands: built-in, aliases, program invocations, alternation and iteration Finding more information: man, info Help

More information

IBM VisualAge for Java,Version3.5. Distributed Debugger for Workstations

IBM VisualAge for Java,Version3.5. Distributed Debugger for Workstations IBM VisualAge for Java,Version3.5 Distributed Debugger for Workstations Note! Before using this information and the product it supports, be sure to read the general information under Notices. Edition notice

More information

Batch and Line Mode Processing in SAS Viya

Batch and Line Mode Processing in SAS Viya Batch and Line Mode Processing in SAS Viya SAS Documentation The correct bibliographic citation for this manual is as follows: SAS Institute Inc. 2016. Batch and Line Mode Processing in SAS Viya. Cary,

More information

Processes. What s s a process? process? A dynamically executing instance of a program. David Morgan

Processes. What s s a process? process? A dynamically executing instance of a program. David Morgan Processes David Morgan What s s a process? process? A dynamically executing instance of a program 1 Constituents of a process its code data various attributes OS needs to manage it OS keeps track of all

More information

More Scripting Todd Kelley CST8207 Todd Kelley 1

More Scripting Todd Kelley CST8207 Todd Kelley 1 More Scripting Todd Kelley kelleyt@algonquincollege.com CST8207 Todd Kelley 1 Arithmetic Output with printf Input from a file from a command CST8177 Todd Kelley 2 A script can test whether or not standard

More information

More Raspian. An editor Configuration files Shell scripts Shell variables System admin

More Raspian. An editor Configuration files Shell scripts Shell variables System admin More Raspian An editor Configuration files Shell scripts Shell variables System admin Nano, a simple editor Nano does not require the mouse. You must use your keyboard to move around the file and make

More information

Effective Use of CCV Resources

Effective Use of CCV Resources Effective Use of CCV Resources Mark Howison User Services & Support This talk... Assumes you have some familiarity with a Unix shell Provides examples and best practices for typical usage of CCV systems

More information

ECE 550D Fundamentals of Computer Systems and Engineering. Fall 2017

ECE 550D Fundamentals of Computer Systems and Engineering. Fall 2017 ECE 550D Fundamentals of Computer Systems and Engineering Fall 2017 The Operating System (OS) Prof. John Board Duke University Slides are derived from work by Profs. Tyler Bletsch and Andrew Hilton (Duke)

More information

Advanced Unix Concepts. Satyajit Rai

Advanced Unix Concepts. Satyajit Rai Advanced Unix Concepts Advanced Unix Concepts Satyajit Rai March 17, 2003 March 22, 2003 KReSIT, IIT Bombay 1 Contents Contents Advanced Unix Concepts.......... 1 Contents.................. 2 Process Creation..............

More information

Simplest version of DayOfYear

Simplest version of DayOfYear Reminder from last week: Simplest version of DayOfYear class DayOfYear { public: void output(); int month; int day; }; Like a struct with an added method All parts public Clients access month, day directly

More information

CST8207: GNU/Linux Operating Systems I Lab Ten Boot Process and GRUB. Boot Process and GRUB

CST8207: GNU/Linux Operating Systems I Lab Ten Boot Process and GRUB. Boot Process and GRUB Student Name: Lab Section: Boot Process and GRUB 1 Due Date - Upload to Blackboard by 8:30am Monday April 16, 2012 Submit the completed lab to Blackboard following the Rules for submitting Online Labs

More information

Processes and Shells

Processes and Shells Shell ls pico httpd CPU Kernel Disk NIC Processes Processes are tasks run by you or the OS. Processes can be: shells commands programs daemons scripts Shells Processes operate in the context of a shell.

More information

Diagnosis and Messages Guide

Diagnosis and Messages Guide Tivoli Workload Scheduler LoadLeveler Diagnosis and Messages Guide Version 3 Release 4.3 GA22-7882-07 Tivoli Workload Scheduler LoadLeveler Diagnosis and Messages Guide Version 3 Release 4.3 GA22-7882-07

More information

Tutorial 4: Condor. John Watt, National e-science Centre

Tutorial 4: Condor. John Watt, National e-science Centre Tutorial 4: Condor John Watt, National e-science Centre Tutorials Timetable Week Day/Time Topic Staff 3 Fri 11am Introduction to Globus J.W. 4 Fri 11am Globus Development J.W. 5 Fri 11am Globus Development

More information

Midterm Exam CPS 210: Operating Systems Spring 2013

Midterm Exam CPS 210: Operating Systems Spring 2013 Your name: Sign for your honor: Midterm Exam CPS 210: Operating Systems Spring 2013 The last page of this exam is a list of terms used in this class, and whose meanings you should know. You may detach

More information

TORQUE Resource Manager5.0.2 release notes

TORQUE Resource Manager5.0.2 release notes TORQUE Resource Manager release notes The release notes file contains the following sections: New Features on page 1 Differences on page 2 Known Issues on page 4 Resolved issues on page 4 New Features

More information

Processes. CS3026 Operating Systems Lecture 05

Processes. CS3026 Operating Systems Lecture 05 Processes CS3026 Operating Systems Lecture 05 Dispatcher Admit Ready Queue Dispatch Processor Release Timeout or Yield Event Occurs Blocked Queue Event Wait Implementation: Using one Ready and one Blocked

More information

OBTAINING AN ACCOUNT:

OBTAINING AN ACCOUNT: HPC Usage Policies The IIA High Performance Computing (HPC) System is managed by the Computer Management Committee. The User Policies here were developed by the Committee. The user policies below aim to

More information

simplevisor Documentation

simplevisor Documentation simplevisor Documentation Release 1.2 Massimo Paladin June 27, 2016 Contents 1 Main Features 1 2 Installation 3 3 Configuration 5 4 simplevisor command 9 5 simplevisor-control command 13 6 Supervisor

More information

Job Management on LONI and LSU HPC clusters

Job Management on LONI and LSU HPC clusters Job Management on LONI and LSU HPC clusters Le Yan HPC Consultant User Services @ LONI Outline Overview Batch queuing system Job queues on LONI clusters Basic commands The Cluster Environment Multiple

More information

Introduction to the Shell

Introduction to the Shell [Software Development] Introduction to the Shell Davide Balzarotti Eurecom Sophia Antipolis, France What a Linux Desktop Installation looks like What you need Few Words about the Graphic Interface Unlike

More information

Configuring and Managing Embedded Event Manager Policies

Configuring and Managing Embedded Event Manager Policies Configuring and Managing Embedded Event Manager Policies The Cisco IOS XR Software Embedded Event Manager (EEM) functions as the central clearing house for the events detected by any portion of the Cisco

More information

Sharpen Exercise: Using HPC resources and running parallel applications

Sharpen Exercise: Using HPC resources and running parallel applications Sharpen Exercise: Using HPC resources and running parallel applications Andrew Turner, Dominic Sloan-Murphy, David Henty, Adrian Jackson Contents 1 Aims 2 2 Introduction 2 3 Instructions 3 3.1 Log into

More information

The Unix Shell & Shell Scripts

The Unix Shell & Shell Scripts The Unix Shell & Shell Scripts You should do steps 1 to 7 before going to the lab. Use the Linux system you installed in the previous lab. In the lab do step 8, the TA may give you additional exercises

More information

SGI Altix Running Batch Jobs With PBSPro Reiner Vogelsang SGI GmbH

SGI Altix Running Batch Jobs With PBSPro Reiner Vogelsang SGI GmbH SGI Altix Running Batch Jobs With PBSPro Reiner Vogelsang SGI GmbH reiner@sgi.com Module Objectives After completion of this module you should be able to Submit batch jobs Create job chains Monitor your

More information

Kea Messages Manual. Kea Messages Manual

Kea Messages Manual. Kea Messages Manual Kea Messages Manual i Kea Messages Manual Kea Messages Manual ii Copyright 2011-2018 Internet Systems Consortium, Inc. ("ISC") Kea Messages Manual iii Contents 1 Introduction 1 2 Kea Log Messages 2 2.1

More information

An Introduction to Cluster Computing Using Newton

An Introduction to Cluster Computing Using Newton An Introduction to Cluster Computing Using Newton Jason Harris and Dylan Storey March 25th, 2014 Jason Harris and Dylan Storey Introduction to Cluster Computing March 25th, 2014 1 / 26 Workshop design.

More information

OS lpr. www. nfsd gcc emacs ls 1/27/09. Process Management. CS 537 Lecture 3: Processes. Example OS in operation. Why Processes? Simplicity + Speed

OS lpr. www. nfsd gcc emacs ls 1/27/09. Process Management. CS 537 Lecture 3: Processes. Example OS in operation. Why Processes? Simplicity + Speed Process Management CS 537 Lecture 3: Processes Michael Swift This lecture begins a series of topics on processes, threads, and synchronization Today: processes and process management what are the OS units

More information

Admin Guide ( Unix System Administration )

Admin Guide ( Unix System Administration ) Admin Guide ( Unix System Administration ) ProFTPD Server Configuration ProFTPD is a secure and configurable FTP server, written for use on Unix and Unix-like operating systems. ProFTPD is modeled around

More information

CS 326: Operating Systems. Process Execution. Lecture 5

CS 326: Operating Systems. Process Execution. Lecture 5 CS 326: Operating Systems Process Execution Lecture 5 Today s Schedule Process Creation Threads Limited Direct Execution Basic Scheduling 2/5/18 CS 326: Operating Systems 2 Today s Schedule Process Creation

More information

IBM DB2 Query Patroller. Administration Guide. Version 7 SC

IBM DB2 Query Patroller. Administration Guide. Version 7 SC IBM DB2 Query Patroller Administration Guide Version 7 SC09-2958-00 IBM DB2 Query Patroller Administration Guide Version 7 SC09-2958-00 Before using this information and the product it supports, be sure

More information

Introduction to remote command line Linux. Research Computing Team University of Birmingham

Introduction to remote command line Linux. Research Computing Team University of Birmingham Introduction to remote command line Linux Research Computing Team University of Birmingham Linux/UNIX/BSD/OSX/what? v All different v UNIX is the oldest, mostly now commercial only in large environments

More information

o Reality The CPU switches between each process rapidly (multiprogramming) Only one program is active at a given time

o Reality The CPU switches between each process rapidly (multiprogramming) Only one program is active at a given time Introduction o Processes are a key concept in operating systems Abstraction of a running program Contains all information necessary to run o On modern systems, many processes are active at the same time

More information

(MCQZ-CS604 Operating Systems)

(MCQZ-CS604 Operating Systems) command to resume the execution of a suspended job in the foreground fg (Page 68) bg jobs kill commands in Linux is used to copy file is cp (Page 30) mv mkdir The process id returned to the child process

More information

Configuring System Message Logging

Configuring System Message Logging CHAPTER 1 This chapter describes how to configure system message logging on the Cisco 4700 Series Application Control Engine (ACE) appliance. Each ACE contains a number of log files that retain records

More information

Lab 4. Out: Friday, February 25th, 2005

Lab 4. Out: Friday, February 25th, 2005 CS034 Intro to Systems Programming Doeppner & Van Hentenryck Lab 4 Out: Friday, February 25th, 2005 What you ll learn. In this lab, you ll learn to use function pointers in a variety of applications. You

More information

Image Sharpening. Practical Introduction to HPC Exercise. Instructions for Cirrus Tier-2 System

Image Sharpening. Practical Introduction to HPC Exercise. Instructions for Cirrus Tier-2 System Image Sharpening Practical Introduction to HPC Exercise Instructions for Cirrus Tier-2 System 2 1. Aims The aim of this exercise is to get you used to logging into an HPC resource, using the command line

More information

Cross-platform daemonization tools.

Cross-platform daemonization tools. Cross-platform daemonization tools. Release 0.1.0 Muterra, Inc Sep 14, 2017 Contents 1 What is Daemoniker? 1 1.1 Installing................................................. 1 1.2 Example usage..............................................

More information

Implementation of a simple shell, xssh

Implementation of a simple shell, xssh Implementation of a simple shell, xssh What is a shell? A process that does command line interpretation Reads a command from standard input (stdin) Executes command corresponding to input line In the simple

More information

New User Seminar: Part 2 (best practices)

New User Seminar: Part 2 (best practices) New User Seminar: Part 2 (best practices) General Interest Seminar January 2015 Hugh Merz merz@sharcnet.ca Session Outline Submitting Jobs Minimizing queue waits Investigating jobs Checkpointing Efficiency

More information

Essentials for Scientific Computing: Bash Shell Scripting Day 3

Essentials for Scientific Computing: Bash Shell Scripting Day 3 Essentials for Scientific Computing: Bash Shell Scripting Day 3 Ershaad Ahamed TUE-CMS, JNCASR May 2012 1 Introduction In the previous sessions, you have been using basic commands in the shell. The bash

More information

GPU Cluster Usage Tutorial

GPU Cluster Usage Tutorial GPU Cluster Usage Tutorial How to make caffe and enjoy tensorflow on Torque 2016 11 12 Yunfeng Wang 1 PBS and Torque PBS: Portable Batch System, computer software that performs job scheduling versions

More information

Computer Science 330 Operating Systems Siena College Spring Lab 5: Unix Systems Programming Due: 4:00 PM, Wednesday, February 29, 2012

Computer Science 330 Operating Systems Siena College Spring Lab 5: Unix Systems Programming Due: 4:00 PM, Wednesday, February 29, 2012 Computer Science 330 Operating Systems Siena College Spring 2012 Lab 5: Unix Systems Programming Due: 4:00 PM, Wednesday, February 29, 2012 Quote: UNIX system calls, reading about those can be about as

More information

History of SURAgrid Deployment

History of SURAgrid Deployment All Hands Meeting: May 20, 2013 History of SURAgrid Deployment Steve Johnson Texas A&M University Copyright 2013, Steve Johnson, All Rights Reserved. Original Deployment Each job would send entire R binary

More information

Introduction Variables Helper commands Control Flow Constructs Basic Plumbing. Bash Scripting. Alessandro Barenghi

Introduction Variables Helper commands Control Flow Constructs Basic Plumbing. Bash Scripting. Alessandro Barenghi Bash Scripting Alessandro Barenghi Dipartimento di Elettronica, Informazione e Bioingegneria Politecnico di Milano alessandro.barenghi - at - polimi.it April 28, 2015 Introduction The bash command shell

More information

SUDO(8) System Manager s Manual SUDO(8)

SUDO(8) System Manager s Manual SUDO(8) NAME sudo, sudoedit - execute a command as another user SYNOPSIS sudo -h -K -k -V sudo -v [-AknS] [-a type] [-g group] [-h host] [-p prompt] [-u user] sudo -l [-AknS] [-a type] [-g group] [-h host] [-p

More information

Shell and Utility Commands

Shell and Utility Commands Table of contents 1 Shell Commands... 2 2 Utility Commands... 3 1 Shell Commands 1.1 fs Invokes any FsShell command from within a Pig script or the Grunt shell. 1.1.1 Syntax fs subcommand subcommand_parameters

More information

A shell can be used in one of two ways:

A shell can be used in one of two ways: Shell Scripting 1 A shell can be used in one of two ways: A command interpreter, used interactively A programming language, to write shell scripts (your own custom commands) 2 If we have a set of commands

More information

Killing Zombies, Working, Sleeping, and Spawning Children

Killing Zombies, Working, Sleeping, and Spawning Children Killing Zombies, Working, Sleeping, and Spawning Children CS 333 Prof. Karavanic (c) 2015 Karen L. Karavanic 1 The Process Model The OS loads program code and starts each job. Then it cleans up afterwards,

More information

HOD User Guide. Table of contents

HOD User Guide. Table of contents Table of contents 1 Introduction...3 2 Getting Started Using HOD... 3 2.1 A typical HOD session... 3 2.2 Running hadoop scripts using HOD...5 3 HOD Features... 6 3.1 Provisioning and Managing Hadoop Clusters...6

More information

Linux System Administration

Linux System Administration System Processes Objective At the conclusion of this module, the student will be able to: Describe and define a process Identify a process ID, the parent process and the child process Learn the PID for

More information

PBS Pro Documentation

PBS Pro Documentation Introduction Most jobs will require greater resources than are available on individual nodes. All jobs must be scheduled via the batch job system. The batch job system in use is PBS Pro. Jobs are submitted

More information

ECE 650 Systems Programming & Engineering. Spring 2018

ECE 650 Systems Programming & Engineering. Spring 2018 ECE 650 Systems Programming & Engineering Spring 2018 User Space / Kernel Interaction Tyler Bletsch Duke University Slides are adapted from Brian Rogers (Duke) Operating System Services User and other

More information

CS24: INTRODUCTION TO COMPUTING SYSTEMS. Spring 2018 Lecture 20

CS24: INTRODUCTION TO COMPUTING SYSTEMS. Spring 2018 Lecture 20 CS24: INTRODUCTION TO COMPUTING SYSTEMS Spring 2018 Lecture 20 LAST TIME: UNIX PROCESS MODEL Began covering the UNIX process model and API Information associated with each process: A PID (process ID) to

More information

RSYSLOGD(8) Linux System Administration RSYSLOGD(8)

RSYSLOGD(8) Linux System Administration RSYSLOGD(8) NAME rsyslogd reliable and extended syslogd SYNOPSIS rsyslogd [ 4 ][ 6 ][ A ][ d ][ D ][ f config file ] [ i pid file ][ l hostlist ][ n ][ N level ] [ q ][ Q ][ s domainlist ][ u userlevel ][ v ][ w ][

More information

Logging in to the CRAY

Logging in to the CRAY Logging in to the CRAY 1. Open Terminal Cray Hostname: cray2.colostate.edu Cray IP address: 129.82.103.183 On a Mac 2. type ssh username@cray2.colostate.edu where username is your account name 3. enter

More information

Mid Term from Feb-2005 to Nov 2012 CS604- Operating System

Mid Term from Feb-2005 to Nov 2012 CS604- Operating System Mid Term from Feb-2005 to Nov 2012 CS604- Operating System Latest Solved from Mid term Papers Resource Person Hina 1-The problem with priority scheduling algorithm is. Deadlock Starvation (Page# 84) Aging

More information

Quick Guide for the Torque Cluster Manager

Quick Guide for the Torque Cluster Manager Quick Guide for the Torque Cluster Manager Introduction: One of the main purposes of the Aries Cluster is to accommodate especially long-running programs. Users who run long jobs (which take hours or days

More information