Platform LSF concepts and terminology

Size: px
Start display at page:

Download "Platform LSF concepts and terminology"

Transcription

1 Platform LSF concepts and terminology Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 7.0

2 Unit objectives After completing this unit, you should be able to: Define Platform LSF terminology Describe LSF architecture

3 What is Platform LSF Platform LSF (load sharing facility) is a suite of distributed resource management products that: Connects computers into a cluster (or grid) Monitors loads of systems Distributes, schedules, and balances workload Controls access and load by policies Analyzes the workload Provides transparent access to all available resources

4 Without Platform LSF Which node can run my job or task? In a distributed environment (hundreds of hosts): Monitoring and controlling of resources is complex Work load is silo -based Resource usage imbalance Users perceive a lack of resources.

5 Without Platform LSF: Example I don t have enough resources to do my work Job Job 1 2 Job Job Patterns: 1. Users over utilize popular/well known hosts Users perceive a lack of resources 2. New hosts go under utilized Administrators perceive excessive resources. Overall resources usage is poor and customer satisfaction is reduced.

6 With Platform LSF Now, Platform LSF will run my job or task on the best node available! Virtual pool of computing resources managed by Platform LSF In a Platform LSF environment (hundreds of hosts): Monitoring and controlling of resources is simple Work load is balanced Resource usage is balanced Users perceive a plethora of resources. All nodes are grouped into a cluster

7 With Platform LSF - Example With Platform LSF, I now have enough resources to do my work! Patterns: 1 Job Job 1. Users jobs are spread out across cluster nodes resulting in resources not being over utilized 2. New hosts become utilized as soon as LSF is installed and started on the host. Overall resource usage is optimized and customer satisfaction is increased. Virtual pool of computing resources managed by Platform LSF 2 All nodes are grouped into a cluster Job Job

8 Key Platform LSF objectives Provide the means to create a powerful computer system made up of many smaller systems to increase productivity and lower operating costs. Match limited supply of resources with demand.

9 Platform LSF benefits Cost management Spend money where it counts - increase profits. Time management Reduce resource idle time. Productivity management Improve personnel productivity, design, and product quality Increase job throughput Reduce time-to-market and costs. Resource management Maximize license utilization Maximize resource usage Maximize sharing of resources.

10 Platform LSF architecture Enterprise applications Application Application Application Application Application Services in Real-time Application Servers Ondemand Heterogeneous enterprise resources Network Servers Licenses Data Storage bandwidth

11 Platform LSF terminology (1 of 2) Cluster A collection of TCP/IP networked hosts running Platform LSF. Master host A cluster requires a master host. This is the first host installed. The master host controls the rest of the hosts in the grid. Master candidates Master failover host. Server host A host within the cluster that submits and executes jobs and tasks. Client host A host within the cluster that only submits jobs and tasks.

12 Platform LSF terminology (2 of 2) Execution host The host that executes the job or task. Submission host Job Queue Job slot The host from which a job or task is submitted. A command submitted to Platform LSF batch. Can take more than one Job slot. A network-wide holding place for jobs which implements different job scheduling and control policies. The basic unit of processor allocation in Platform LSF. Can be more than one per physical processor.

13 Platform LSF terminology diagram ( 1 of 2 ) Cluster Client host Master host Server host Acts as a submission host Acts as a submission host and execution host Can be CLOSED to not allow execution or submission Acts as a submission host and execution host Can also be a master candidate

14 Platform LSF terminology diagram (2 of 2 ) Cluster Job Job Job Submission host Job submitted to LSF from this host LSF client or LSF server Master host Job placed in queue Passed to scheduler Dispatched to the execution host with an open job slot LSF master Execution host Execution host receives the job Begins execution of job or task LSF server

15 Platform LSF overview 1 1. Individual machines are grouped into a cluster to be managed by Platform LSF One machine in the cluster is selected as the master of Platform LSF. 3. Each slave machine in the cluster collects its own vital signs periodically and reports them back to the master. 4. Users then submit their jobs to Platform LSF and the master makes a decision on where to run the job based on the collected vital signs.

16 Platform LSF job submission and control Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 7.0

17 Unit objectives After completing this unit, you should be able to: Submit jobs Manipulate jobs such as stopping, resuming and terminating View detailed job information

18 Setting up the Platform LSF environment Setup the Platform LSF environment before using Platform LSF commands. C-shell type shells (csh, tcsh) % source /YZGROUP/ibm/lsf/conf/cshrc.lsf Bourne type shells (sh, bash, ksh, and so on.) $. /YZGROUP/ibm/lsf/conf/profile.lsf Check after setting up environment: % env grep -i lsf MANPATH=/YZGROUP/ibm/lsf/10.1/man: LSF_SERVERDIR=/YZGROUP/ibm/lsf/10.1/linux2.6-glibc2.3-x86_64/etc LSF_LIBDIR=/YZGROUP/ibm/lsf/10.1/linux2.6-glibc2.3-x86_64/lib LD_LIBRARY_PATH=/tools/user01/lsf/9.1/linux2.6-glibc2.3-x86_64/lib PATH=/YZGROUP/ibm/lsf/10.1/linux2.6-glibc2.3-x86_64/etc:/YZGROUP/ ibm/lsf/10.1/linux2.6-glibc2.3-x86_64/bin:/bin:/usr/bin:/usr/ local/bin LSF_BINDIR=/YZGROUP/ibm/lsf/10.1/linux2.6-glibc2.3-x86_64/bin LSF_ENVDIR=/YZGROUP/ibm/lsf/conf

19 Job submission and control commands bsub [options] command [cmdargs] bjobs [-a][-j jobname][-u usergroup -u all][ ] jobid bhist [-a][-j jobname][-u usergroup -u all][ ] jobid bbot/btop [jobid "jobid[index_list] ] [position] bkill [-J jobname] [-m] [-u ] [-q] [-s signalvalue] bmod [bsub_options] jobid bpeek [-f] jobid bstop/bresume jobid bswitch destination_queue jobid

20 Platform LSF job submission commands (1 of 4) bsub [commonly used options] -n # - number of CPUs required for the job -o filename - redirect stdout, stderr, and resource usage -oo filename information of the job to the specified output file - same as -o, but overwrite file if it exists -e filename - redirect stderr to the specified error file -eo filename - same as -e, but overwrite file if it exists -i filename - use the specified file as standard input for the job -q qname - submits the job to the specified queue -m hname - select host(s) or host group; keywords all and others can be used -J jobname - assigns the specified name to the job

21 Platform LSF job submission commands (2 of 4) bsub [options] -Q [exit_code] [EXCLUDE(exit_code)] - Success exit code -L login_shell - Initializes the execution environment using the specified login shell -n number - if PARALLEL_SCHED_BY_SLOT=Y in lsb.params, then specify number of job slots, not processors -g jobgroup - submit job to specified group -sla serviceclass - submit job to specified service class -W runlimit - if ABS_RUNLIMIT=Y uses wall clock time -app - application profiling

22 Platform LSF job submission commands (3 of 4) bsub [setting limits] -C core_limit - set the per-process core file size limit (KB) -c cpu_time - limit the total CPU time for the job ([HH:]mm) -cwd dir_path - specify the current working directory for the job -W runlimit - set the run time limit for the job ([HH:]MM) -We run_time - specifies an estimation run time for the job ([HH:]MM)

23 Platform LSF job submission commands (4 of 4) bsub [options] -B - send when the job is dispatched -H - hold the job in the PSUSP state after submission -N - job header only when job completes -b begin_time - dispatch the job after the specified time with year and time -G user_group - associate job with specified user group (fairshare) -t term_time - specify the job termination deadline with year and time -u mail_user - send to the specified address

24 Platform LSF interactive job submission bsub -I submit an interactive batch job -Ip -Is submit an interactive batch job with pseudo-tty support submit an interactive batch job and create a pseudo-tty with shell mode support Examples: [lsfuser@host1 conf]$ bsub -I hostname Job <212> is submitted to default queue <interactive>. <<Waiting for dispatch...>> <<Starting on host2>> host2

25 Platform LSF interactive: SSH support Four bsub options handle interactive jobs using SSH connection: -IS - submit interactive job with SSH -ISp - submit interactive pseudo terminal job with SSH -ISs - submit interactive pseudo terminal shell job with SSH -IX - used to submit interactive X-window job with SSH SSH port forwarding is used to create a secured channel between the submission host and the execution host for X-Window jobs. Do not need to run xhost + command on the X-server host. The X-server host can handle connections through the SSH channel. Use the command netstat -an and ps ef to check the SSH channel of the interactive job.

26 bsub: Condensed host notations bsub -m host[1-11,24]+1 host[12-23]+2 job1 The (+) indicates that host[12-23] are the most preferred and host[1-11] and host24 are the least preferred. Commands that can use the notation: bsub brun bmod brestart brsvadd brsvmod bswitch bjobs bhist bacct brsvs bmig bpeek

27 bsub 4k jobname bsub J `pwd` command Allows user to use CWD or PWD as the job name # bsub J`pwd` sleep 100 Job <4>, Job Name </tools/home/john/regression/chip1>, User <awu>, Project <default>, Status <RUN>, Queue <normal>, Command <sleep 100> Thu Apr 17 17:48:43: Submitted from host <ibm01.platform.com>, CWD <$HOME/john/regression/tool/data>; Thu Apr 17 17:51:55: Started on <ibm01.lsf.platform.com>, Execution Home </home/john>, Execution CWD </home/john/regression/tool/data>; Thu Apr 17 17:52:12: Resource usage collected. MEM: 2 Mbytes; SWAP: 5 Mbytes; NTHREAD: 1 PGID: 30400; PIDs: 30400

28 Available methods for submitting jobs By script or command % cd /home/user/project_dir % bsub q parallel a fluent n 4./my_fluent_launcher.sh By job spooling % bsub < spoolfile Interactively % bsub bsub> #BSUB q parallel n 4 bsub> #BSUB a fluent bsub> cd /home/user/project_dir bsub>./my_fluent_launcher.sh bsub> ^D Job <1234> submitted to queue <parallel> Using Platform Application Center (PAC). Note: PAC is a separate product. Example spoolfile #BSUB q parallel #BSUB n 4 a fluent cd /home/user/project_dir./my_fluent_launcher.sh

29 Platform LSF interactive task submission lsrun Submits a task to Platform LSF for execution Uses the Platform LSF base system only (no queues policies) Task will run even if hosts are closed Tasks will run immediately (no scheduling delay) Tasks will not run when a host is busy Can be disable with configuration change Note: As an LSF Administrator you DO NOT want users running this command to submit jobs as it circumvents the scheduling policies! Disable by setting LSF_DISABLE_LSRUN=y in lsf.conf

30 Platform LSF job termination bkill Bulk job termination--new bkill -b option for terminating a bulk set of jobs from the system. This eases administration and decreases the time to delete a large number of jobs. Example: To terminate all jobs of the current user % bkill b 0

31 Viewing submitted job information bjobs Can display parallel jobs and condensed host groups in an aggregate format -a Display information about jobs in all states (including finished jobs) -A Display summarized information about job arrays -d Display information about jobs that finished recently -l -w Display information in long or wide format -p Display information about pending jobs -r Display information about running jobs -g job_group Display information about jobs in specified group -J job_name Display information about specified job or array -m host_list Display information about jobs on specified hosts or groups -P project_name Display information about jobs in specified project -q queue_name Display information about jobs in specified queue -u user_name Display information about jobs for specified users/groups -sum Display information about jobs that are unfinished

32 Job name: Wild card "*" -J Option Supports multiple wildcards in any position (beginning, middle, and end) of a job and job array names Examples: -J joba* -J job*a -J "*A -J * - To match all job -J "myar*aya[1]

33 View submitted job information example % bjobs u all a JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME 1233 user1 DONE normal training8 training1 *sortname Nov 21 10: user1 RUN priority training8 training1 *verilog Nov 21 10: user2 PEND night training9 *sortfile Nov 21 10: user2 PEND normal training9 *sortname Nov 21 10:04 % bjobs -u all -J "*sort*: JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME 1233 user1 DONE normal training8 training1 *sortname Nov 21 10: user2 PEND night training9 *sortfile Nov 21 10: user2 PEND normal training9 *sortname Nov 21 10:04

34 View detailed submitted job information % bjobs -l 1235 Job <1235>, User <user2>, Project <default>, Status <PEND>, Queue <night>, Command <night_job> Wed Nov 21 10:03:51: Submitted from host <host9>, CWD <$HOME>; Pending reasons: The queue is inactivated by its time windows; Scheduling parameters: r15s r1m r15m ut pg io ls it tmp swp mem loadsched Resource Requirement Details ( LSF ): RESOURCE REQUIREMENT DETAILS: Combined: select[type == local] order[r15s:pg] Effective: -

35 Viewing historical job information bhist -a Display information about all jobs (overrides d, -p, -r, and -s) -b -l -w Display information in brief, long, or wide format -d Display information about finished jobs -p Display information about pending jobs -s Display information about suspended jobs -t Display job events chronologically -C -D -S -T start_time Display information about completed, end_time dispatched, submitted, or all jobs in specified time window -P project Display information about jobs belonging to specified project -q queue Display information about jobs submitted to specified queue -u username all Display information about jobs submitted by user

36 View historical job information example (1 of 2) % bhist Summary of time in seconds spent in various states: JOBID USER JOB_NAME PEND PSUSP RUN USUSP SSUSP UNKWN TOTAL 2299 alfred *eep alfred *eep alfred *eep alfred *eep

37 View historical job information example (2 of 2) % bhist -l 2302 Job <2302>, User <lsfuser>, Project <default>, Command <sleep 100> Wed Mar 30 13:53:44: Submitted from host <host1>, to Queue <normal>,cwd<$home>; Mon Mar 18 13:30:37: Dispatched to <host3>, Effective RES_REQ <select[type ==local] order[r15s:pg] >; Wed Mar 30 13:53:47: Starting (Pid 24595); Wed Mar 30 13:53:52: Running with execution home </home/lsfuser>, Execution CWD </home/lsfuser>, Execution Pid <24595>; Wed Mar 30 13:55:32: Done successfully. The CPU time used is 0.0 seconds; Wed Mar 30 13:55:32: Post job process done successfully; MEMORY USAGE: MAX MEM: 3 Mbytes; AVG MEM: 3 Mbytes Summary of time in seconds spent in various states by Wed Mar 30 13:55:32 PEND PSUSP RUN USUSP SSUSP UNKWN TOTAL

38 Manipulating jobs options bbot btop bkill bmod bpeek bstop bresume bswitch moves a pending job to the bottom of the queue. moves a pending job to the top of the queue. sends a signal to kill, suspend or resume unfinished jobs (use a job ID of 0 to kill all your jobs). New scalability improvements resulting in improved performance and user experience. modifies job submission options of a job. displays the stdout and stderr of an unfinished job. suspend unfinished jobs. resumes one or more suspended jobs. switches unfinished jobs to another queue.

39 Platform LSF job submission examples (1 of 2) Example #1 % bsub /home/lsf/scripts/sleeper Job <1234> is submitted to default queue <normal> % bstop 1234 Job <1234> is being stopped % bresume 1234 Job < 1234> is being resumed Example #2 % bsub q night m "host1 host2" night_job Job <1235> is submitted to queue <night> % bmod m "hostgroupa" 1235 Parameters of job <1235> are being changed

40 Platform LSF job submission examples (2 of 2) Example #3 % bsub i "/home/user1/in.dat" -o "/home/user1/out.dat" e "/home/user1/error.dat" long_job Job <1236> is submitted to default queue <normal> % bswitch night 1236 Job <1236> is switched to queue <night> Example #4 % bsub P Research J Projection1 project_job Job <1238> is submitted to default queue <normal> % btop 1238 Job <1238> has been moved to position 1 from top % bbot 1238 Job <1238> has been moved to position 1 from bottom

41 Job arrays A job array is a Platform LSF structure that allows a sequence of jobs to: Share the same executable Have different input files Be controlled and monitored as a single job unit. Once the job array has been submitted, the individual elements of the array are independently scheduled and dispatched. Examples: regression testing, rendering, and file conversion.

42 Creating job arrays (1 of 2) Syntax: bsub J "AName[index, ]" i in.%i appbin Where: J "AName[ ]" names and creates the job array index can be start[-end[:step]] where: start specifies an individual index or range of indices end specifies the end of a range of indices step specifies the value to increment indices in a range i in.%i is the input file to be used and %I is the array index reference value appbin is the application to be executed

43 Creating job arrays (2 of 2) By default, a maximum number of 1000 elements can be submitted at one time using a job array. To allow a greater number of elements in a job array submission, in lsb.params: MAX_JOB_ARRAY_SIZE = integer where integer can be from 1 to

44 Submitting job arrays examples (1 of 3) Example #1 % bsub J "test[1-1000]" i "in.%i" o "out.%j.%i" appa Job <104> submitted to default queue <normal> The %I is the index value of the job array and %J is the jobid assigned for the submitted job array. Example output files: out.104.1, out.104.2, out.104.3, out.104.4, Example #2 % bsub J "test[1-100:3]" i "in.%i" o "out.%j.%i appa Job <105> submitted to default queue <normal> Step through the index values by iteration of three. Example output files: out.105.1, out.105.4, out.105.7, out

45 Submitting job arrays examples (2 of 3) Example #3 % bsub J "test[1-10,45,90]" i "in.%i o "out.%i.%j" appa Job <106> submitted to default queue <normal> Specific index values can be specified. Example output files: out.104.1, out.104.2, out.104.3,, out , out

46 Submitting job arrays examples (3 of 3) Example #4 % bsub J "test[1-200]%2" i "input.%i" appa Job <108> submitted to default queue <normal> Only 2 elements from this job array may run concurrently in the cluster at one time. [lsfuser@host1 ~]$ bjobs -A 108 JOBID ARRAY_SPEC OWNER NJOBS PEND DONE RUN EXIT SSUSP USUSP PSUSP 108 test[1-2 lsfuser Example output files: out.104.1, out.104.2, out.104.3,, out , out

47 Monitoring job arrays View all elements of the job array % bjobs 104 JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME 104 user1 RUN normal training8 training7 test[1] Jun 6 15: user1 RUN normal training8 training5 test[2] Jun 6 15: user1 PEND normal training8 test[3] Jun 6 15:40 View a single element of the job array % bjobs "104[90]" JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME 104 user1 PEND normal training8 test[90] Jun 6 15:40 View the summary status of the job array % bjobs A 104 JOBID ARRAY_SPEC OWNER NJOBS PEND DONE RUN EXIT SSUSP USUSP PSUSP 104 test user

48 Terminating job arrays To terminate all elements of an array: % bkill 104 Job <104>: Operation is in progress To terminate an individual element: % bkill "104[90]" Job <104[90]> is being terminated To terminate a group of elements: % bkill "104[1-10,75,90]" Job <104[1]> is being terminated Job <104[10]> is being terminated Job <104[75]> is being terminated Job <104[90]> is being terminated

49 Resource Requirement String Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 7.0

50 Resource requirements Each job or task can have its own resource requirements. Server hosts that match the resource requirements of the job are considered as candidates for execution. Can be set in queues and/or for individual jobs.

51 Resource requirement string (1 of 2) Platform LSF Job level resource requirement: bsub R Queue level and Application profile resource requirement: RES_REQ Describes the resources required by a job or task. Used for mapping tasks and jobs onto execution hosts.

52 Resource requirement string (2 of 2) A resource requirement string is divided into the following sections: Selection - select[selection_string] Ordering - order[order_string] Usage - rusage[rusage_string] Locality - span[span_string] Same - same[same_string] Compute unit - cu[cu_string] Affinity - affinity[affinity_string] Note: cu[] and affinity[] are covered in the advanced LSF class. The span and same sections are specifically used for parallel jobs.

53 Selection string (1 of 2) A logical expression built from a set of resource names. Specifies the characteristics of a server host to be considered as a potential execution host. Evaluated for each host If the expression evaluates to true, then that host is considered a candidate.

54 Selection string (2 of 2) The select keyword can be omitted if the selection section is first in the resource requirement string. The default selection string for execution type commands such as bsub or lsrun is: select[type==local] The default selection string for query type commands such as lsload is: select[type==any] The default selection string for query and execution type commands such as bsub or lsrun on a client host is: select[type==any]

55 Selection string examples (1 of 2) $ bsub R "select[type==any && swp>=300 && \ mem>500]" job1 Select a candidate execution host of any type which has at least 300 MB of available swap and more than 500 MB of available memory. $ lsload R "select[type==local && cpuf<18.0] Displays all candidate execution hosts of the same type as the submission host which has a CPU factor less than 18.0.

56 Selection string examples (2 of 2) $ bsub R (ut<0.50 && ncpus==2) \ (ut<0.75 && ncpus==4) job2 Select a candidate execution host the CPU utilization is less than 0.50 and the number of CPUs is two, or the CPU utilization is less than 0.75 and the number of CPUS is four. $ bsub R "type==sunsol && swap>300 \ type==hppa && swap>400" task1 Select a candidate execution host where the type is SUNSOL and more than 300 MB of available swap or where the type is HPPA and has more than 400 MB of available swap.

57 Order string Allows candidate execution hosts to be sorted according to the value of resources. The first index is the primary sort key, the second is your secondary sort key, and so on. Hosts are ordered from best to worst on the given index or indices. If defined at the job and queue level, job level takes precedent. The default order string used is order[r15s:pg].

58 Order string examples (1 of 2) $ bsub R "select[type==any && swp>=300 && \ mem>500] order[mem]" job1 Order the candidate execution hosts from the highest to lowest amount of available memory. $ lsload R "select[type==local && cpuf < 18.0] \ order[cpuf] Order the candidate execution hosts from highest to lowest CPU factor. If the order is not specified, order of the candidate execution hosts will use the default order string of [r15s:pg].

59 Order string examples (2 of 2) $ bsub R select[ut<0.8 && mem>200] \ order[r1m:ut:-mem]" small_mem_job.sh Order the candidate execution hosts from lowest to highest one minute run queue length, from lowest to highest CPU utilization, and from lowest to highest amount of available memory.

60 Resource usage string Specifies resource reservations for jobs on execution hosts. Ignored when running interactive tasks. By default no resources are reserved. If rusage is defined at the job level and queue level, the job level takes precedence. Keywords duration and decay can be used. If a job can run with more than one rusage string, it is possible to specify multiple strings with an OR operator and have Platform LSF pick the first one that matches.

61 Resource usage string examples (1 of 3) $ bsub R "select[type==any && swap>=300 && \ mem>500] order[swap:mem] \ rusage[swap=300,mem=500]" job1 On the selected execution host, reserve 300 MB of swap space and 500 MB of memory for the duration of the job. $ bsub R rusage[mem=500:app_lic_v2=1 \ mem=400:app_lic_v1.5=1]" job1 Job will use 500 MB with app_lic_v2, or 400 MB with app_lic_v1.5. Resource reservation is ignored for interactive tasks (that is: lsload, lsrun).

62 Resource usage string examples (2 of 3) $ bsub R select[ut<0.50 && ncpus==2] \ rusage[ut=0.50:duration=20:decay=1] job2 On the selected execution host, reserve 50% of cpu utilization and linearly decay the amount of cpu utilization reserved over the duration of the period. $ bsub R "select[type == SUNSOL && mem > 300] \ rusage[mem=300:duration=1h]" job3 On the selected execution host, reserve 300 MB of memory for one hour.

63 Resource usage string examples (3 of 3) $ bsub -q reserve R rusage[res1=4,tmp=1000] Job1 % bjobs lp Job <215>, User <john>, Project <default>, Status <PEND>, Queue <reserve>, Command <job1> Thu Jul 24 07:12:16: Submitted from host <delpe07>, CWD </home/john>, Requested Resources <rusage[res1=4, tmp=1000]>; Thu Jul 24 07:12:21: Reserved <1> job slot on host <delpe07>; Thu Jul 24 07:12:21: Reserved <581> megabyte tmp on host <581M*delpe07>; Thu Jul 24 07:12:21: Reserved <2> res1

64 Span string Specifies the locality of a parallel job. Supported options: span[hosts=1] which indicates that all processors allocated to this job must be on the same execution host. span[ptile=n] which indicates that up to n processors on each execution host should be allocated to the job. span[ptile=![,hosttype:n] uses the predefined maximum job slot limit in lsb.hosts (MXJ per host type/model) as the value for other host model or type, other then those host type is specified. When defined at both job-level and queue-level, the job-level definition takes precedence.

65 Span string examples $ bsub n 16 R "select[ut<0.15] order[ut] \ span[hosts=1]" parallel_job1 All processors required to complete this job must reside on the same execution host. $ bsub n 16 R "select[ut<0.15] order[ut] \ span[ptile=2]" parallel_job2 Up to two CPUs per execution host can be used to execute this job therefore at least eight execution hosts are required to complete this job.

66 Same string Used to specify that all processes of a parallel job must run on hosts with the same resources. The parallel job scheduler plugin must be installed to use this option. When defined at both job-level and queue-level, both requirements are combined to allocate processors. Any static resource can be specified.

67 Same string examples $ bsub n 64 R "select[type==sgi6 type==sol7] \ same[type]" parallel_job1 Run all parallel processes on the same host type, either SGI6 or SOL7, but not both. $ bsub n 64 R "select[type==any] \ same[type:model]" parallel_job2 Run all parallel processes on the same host type and model.

68 Multiple resource requirements Support for multiple resource requirement strings (-R) options. The administrator can easily change resource requirements. In job level submission: bsub -R "select[swp > 15]" -R select[hpux] order[r15m] -R rusage[mem=100] R order[ut] -R same[type] -R rusage[tmp=50:duration=60] -R same[model] myjob Platform LSF merges the multiple -R options into one string and selects a host that meets all of the resource requirements. The number of -R option sections is unlimited. Up to a maximum of 512 characters for the entire string per R option.

69 Support multiple resource requirements (1 of 3) Merge job level, app level, and queue level resource requirement strings. Queue level define: RES_REQ = rusage[r1=v1 R2=V2 R3=V3 R4=V4] Job level resource requirements: # bsub -R "rusage[r5=v5:duration=60] Expected merge result: rusage [ R5=V5:duration=60, R1=V1 R5=V5:duration=60, R2=V2 R5=V5:duration=60, R3=V3 R5=V5:duration=60, R4=V4 ]

70 Support multiple resource requirements (2 of 3) Multiple R options support commands: bsub bmod Supports order, rusage, select, and same. Multiple span clauses are not supported.

71 Support multiple resource requirements (3 of 3) Example: Queue level resource definition: RES_REQ: rusage[swp=5000 mem=100] Job level resource definition: bsub -R "rusage[tmp=50 mem=50:duration=60]" -q test job1 Job <2> is submitted to queue <test_resreq> # bjobs l 2 Job <2>, User <usr1>, Project <default>, Status <RUN>, Queue <test>, Command <job1> Tue Apr 10 03:37:52: Submitted from host <lsfhost1>, CWD </>, Requested Resources <rusage[tmp=50 mem=50:duration=60]>; Tue Apr 10 03:37:58: Started on <lsfhost2>, Execution Home </home/usr1>, Execution CWD </>, Execution rusage <[tmp=50.00,mem=100.00]>; RESOURCE REQUIREMENT DETAILS: Combined: select[type == local] order[r15s:pg] rusage[tmp=50.00:swp= t mp=50.00:mem= mem=50.00:duration=1h:decay=0,swp= mem=50.00:duration=1h:decay=0]

72 Resources requirements priorities If application level resource requirements are not defined, job-level and queue-level resource requirements are merged. If application-level resource requirements are defined, joblevel requirements usually take precedence. Specifically: The select sections from the job, application profile, and queue must all be satisfied. The job-level order section overrides the application profile section, which overrides the queue-level section. The job-level rusage section takes precedence. Any rusage requirements defined in the application profile that are not already specified at the job level are then merged. Any queue-level requirements are then merged with that result. The job-level span section overrides the application profile span section, which overrides a queue-level section. The same section from the job, application profile, and queue must all be satisfied.

73 Platform LSF cluster query commands Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 7.0

74 Unit objectives After completing this unit, you should be able to: View cluster, resource and queue information View job information including resource usage, history and output View host and user groups

75 Platform LSF cluster query commands UNIX/Linux manual pages exist for all commands # man commandname LIM commands lsid lsinfo [-l][-m -M][-r][-t][resource_name...] lshosts [-w -l][-r "res_req"][host_name]... lsload [-l][-n -E][-n num_hosts][-r "res_req"] [host_name...] lsmon [-N -E][-n num_hosts][-r "res_req"][-i interval] [-L file_name][host_name...] Batch commands bhosts [-e -l -w][-x][-x][-r "res_req"][host_name host_group]... bmgroup [-r][-l][-w][host_group...] bqueues [-w -l -r][-m host_name -m all][queue_name...] busers [use rname... user group... all] bugroup [-l][-r][-w][user_group ]

76 View cluster and master names Display LSF version number, cluster name, and master host # lsid IBM Platform LSF Standard , Feb Copyright International Business Machines Corp, US Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp. My cluster name is cluster01 My master name is host1.ibm.edu Display load sharing configuration information # lsinfo RESOURCE_NAME TYPE ORDER DESCRIPTION r15s Numeric Inc 15-second CPU run queue length mem Numeric Dec Available memory (Mbytes) ncpus Numeric Dec Number of CPUs ndisks Numeric Dec Number of local disks mg Boolean N/A Management hosts TYPE_NAME UNKNOWN_AUTO_DETECT DEFAULT LINUX MODEL_NAME CPU_FACTOR ARCHITECTURE Intel_EM64T x15_6789_intelrxeon

77 Server host configuration information Display static resource information about hosts. # lshosts HOST_NAME type model cpuf ncpus maxmem maxswp server RESOURCES host1.ibm.e X86_64 Intel_EM G 5.7G Yes (mg) host2 X86_64 Intel_EM G 5.7G Yes () host3 X86_64 Intel_EM G 5.7G Yes () host4 X86_64 Intel_EM G 5.7G Yes () Display detailed static resource information about a particular host # lshosts -l host1 HOST_NAME: host1.ibm.edu type model cpuf ncpus ndisks maxmem maxswp maxtmp rexpri server nprocs ncores nthreads X86_64 Intel_EM64T G 5.7G 5799M 0 Yes RESOURCES: (mg) RUN_WINDOWS: (always open) LOAD_THRESHOLDS: r15s r1m r15m ut pg io ls it tmp swp mem

78 Server host load information Display load information for hosts # lsload HOST_NAME status r15s r1m r15m ut pg ls it tmp swp mem host1.ibm.edu ok % M 5.7G 3.5G host2 ok % M 5.7G 3.5G host3 ok % M 5.7G 3.5G host4 ok % M 5.7G 1.1G Display load information without truncation (including disk I/O) # lsload -l HOST_NAME status r15s r1m r15m ut pg io ls it tmp swp mem host2 ok % M 5.7G 3.5G host3 ok % M 5.7G 3.5G host4 ok % M 5.7G 1.1G host1.ibm.edu ok % M 5.7G 3.5G Display current load information with periodic updates # lsmon Hostname: host1.ibm.edu Refresh rate: 10 secs HOST_NAME status r15s r1m r15m ut pg ls it tmp swp mem host2 ok % M 5.7G 3.5G host3 ok % M 5.7G 3.5G host1.ibm.edu ok % M 5.7G 3.5G host4 ok % M 5.7G 1.1G

79 Batch server host information Display hosts and their static and dynamic resources # bhosts HOST_NAME STATUS JL/U MAX NJOBS RUN SSUSP USUSP RSV host1.ibm.edu ok host2 ok host3 ok host4 ok Display detailed resource information about a particular host # bhosts -l host1 HOST host1.ibm.edu STATUS CPUF JL/U MAX NJOBS RUN SSUSP USUSP RSV DISPATCH_WINDOW ok CURRENT LOAD USED FOR SCHEDULING: r15s r1m r15m ut pg io ls it tmp swp mem slots Total % M 5.7G 3.5G 2 Reserved % M 0M 0M - LOAD THRESHOLD USED FOR SCHEDULING: r15s r1m r15m ut pg io ls it tmp swp mem loadsched loadstop

80 Queue information Display information about queues # bqueues QUEUE_NAME PRIO STATUS MAX JL/U JL/P JL/H NJOBS PEND RUN SUSP priority 43 Open:Active night 40 Open:Active chkpnt_rerun_qu 40 Open:Active normal 30 Open:Active interactive 30 Open:Active idle 20 Open:Active Display detailed information about queues # bqueues -l normal QUEUE: normal -- For normal low priority jobs, running only if hosts are lightly loaded. This is the default queue. PARAMETERS/STATISTICS PRIO NICE STATUS MAX JL/U JL/P JL/H NJOBS PEND RUN SSUSP USUSP RSV Open:Active Interval for a host to accept two jobs is 0 seconds SCHEDULING PARAMETERS r15s r1m r15m ut pg io ls it tmp swp mem loadsched loadstop SCHEDULING POLICIES: NO_INTERACTIVE USERS: all HOSTS: all

81 User and user group information Display user information By default busers will display the user information for the current user Specify all to see all user and group information # busers all USER/GROUP JL/P MAX NJOBS PEND RUN SSUSP USUSP RSV default user user usergroupa Display user group member information # bugroup GROUP_NAME usergroupa usergroupb USERS user2 user3 UnixAdminGrp NISgroup NTgroup

HPC Middle East. KFUPM HPC Workshop April Mohamed Mekias HPC Solutions Consultant. A Quick Tour of IBM Platform LSF

HPC Middle East. KFUPM HPC Workshop April Mohamed Mekias HPC Solutions Consultant. A Quick Tour of IBM Platform LSF KFUPM HPC Workshop April 20-30 2015 Mohamed Mekias HPC Solutions Consultant A Quick Tour of IBM Platform LSF 1 Quick introduction to LSF for end users IBM Platform LSF (load sharing facility) is a suite

More information

LSF at SLAC. Using the SIMES Batch Cluster. Neal Adams. Stanford Linear Accelerator Center

LSF at SLAC. Using the SIMES Batch Cluster. Neal Adams. Stanford Linear Accelerator Center LSF at SLAC Using the SIMES Batch Cluster Neal Adams Stanford Linear Accelerator Center neal@slac.stanford.edu Useful LSF Commands bsub submit a batch job to LSF bjobs display batch job information bkill

More information

Best practices. Using Affinity Scheduling in IBM Platform LSF. IBM Platform LSF

Best practices. Using Affinity Scheduling in IBM Platform LSF. IBM Platform LSF IBM Platform LSF Best practices Using Affinity Scheduling in IBM Platform LSF Rong Song Shen Software Developer: LSF Systems & Technology Group Sam Sanjabi Senior Software Developer Systems & Technology

More information

Laohu cluster user manual. Li Changhua National Astronomical Observatory, Chinese Academy of Sciences 2011/12/26

Laohu cluster user manual. Li Changhua National Astronomical Observatory, Chinese Academy of Sciences 2011/12/26 Laohu cluster user manual Li Changhua National Astronomical Observatory, Chinese Academy of Sciences 2011/12/26 About laohu cluster Laohu cluster has 85 hosts, each host has 8 CPUs and 2 GPUs. GPU is Nvidia

More information

Installation Instructions for Platform Suite for SAS Version 9.1 for Windows

Installation Instructions for Platform Suite for SAS Version 9.1 for Windows Installation Instructions for Platform Suite for SAS Version 9.1 for Windows The correct bibliographic citation for this manual is as follows: SAS Institute Inc. 2015. Installation Instructions for Platform

More information

Running Jobs with Platform LSF. Version 6.0 November 2003 Comments to:

Running Jobs with Platform LSF. Version 6.0 November 2003 Comments to: Running Jobs with Platform LSF Version 6.0 November 2003 Comments to: doc@platform.com Copyright We d like to hear from you Document redistribution policy Internal redistribution Trademarks 1994-2003 Platform

More information

Installation Instructions for Platform Suite for SAS Version 10.1 for Windows

Installation Instructions for Platform Suite for SAS Version 10.1 for Windows Installation Instructions for Platform Suite for SAS Version 10.1 for Windows The correct bibliographic citation for this manual is as follows: SAS Institute Inc. 2016. Installation Instructions for Platform

More information

Using Platform LSF Advanced Edition

Using Platform LSF Advanced Edition Platform LSF Version 9 Release 1.3 Using Platform LSF Advanced Edition SC27-5321-03 Platform LSF Version 9 Release 1.3 Using Platform LSF Advanced Edition SC27-5321-03 Note Before using this information

More information

Platform LSF Desktop Support User s Guide

Platform LSF Desktop Support User s Guide Platform LSF Desktop Support User s Guide Version 7.0 Update 2 Release date: November 2007 Last modified: December 4 2007 Support: support@platform.com Comments to: doc@platform.com Copyright We d like

More information

Choosing Resources: How Much RAM? How Many CPUs?

Choosing Resources: How Much RAM? How Many CPUs? Choosing Resources: How Much RAM? How Many CPUs? RCS Lunch & Learn Training Series Bob Freeman, PhD Director, Research Technology Operations HBS 3 October, 2017 Overview Q&A Why? What's Going On? Choosing

More information

LSF Parallel User s Guide

LSF Parallel User s Guide LSF Parallel User s Guide Version 4.0 February 2000 Platform Computing Corporation Copyright Second Edition, February 2000 Copyright 1998-2000 Platform Computing Corporation All rights reserved. Although

More information

Platform LSF Version 9 Release 1.2. Running Jobs SC

Platform LSF Version 9 Release 1.2. Running Jobs SC Platform LSF Version 9 Release 1.2 Running Jobs SC27-5307-02 Platform LSF Version 9 Release 1.2 Running Jobs SC27-5307-02 Note Before using this information and the product it supports, read the information

More information

IBM Spectrum LSF Version 10 Release 1. Release Notes IBM

IBM Spectrum LSF Version 10 Release 1. Release Notes IBM IBM Spectrum LSF Version 10 Release 1 Release Notes IBM IBM Spectrum LSF Version 10 Release 1 Release Notes IBM Note Before using this information and the product it supports, read the information in

More information

Improved Infrastructure Accessibility and Control with LSF for LS-DYNA

Improved Infrastructure Accessibility and Control with LSF for LS-DYNA 4 th European LS-DYNA Users Conference LS-DYNA Environment I Improved Infrastructure Accessibility and Control with LSF for LS-DYNA Author: Bernhard Schott Christof Westhues Platform Computing GmbH, Ratingen,

More information

Fixed Bugs for IBM Spectrum LSF Version 10.1 Fix Pack 1

Fixed Bugs for IBM Spectrum LSF Version 10.1 Fix Pack 1 Fixed Bugs for IBM Spectrum LSF Version 10.1 Fix Pack 1 The following bugs have been fixed in LSF Version 10.1 Fix Pack 1 between 22 July 2016 and 20 October 2016: P101978 Date 2016-10-20 IBM Spectrum

More information

Platform LSF Version 9 Release 1.2. Quick Reference GC

Platform LSF Version 9 Release 1.2. Quick Reference GC Platform LSF Version 9 Release 1.2 Quick Reference GC27-5309-02 Platform LSF Version 9 Release 1.2 Quick Reference GC27-5309-02 Note Before using this information and the product it supports, read the

More information

Fixed Bugs for IBM Platform LSF Version

Fixed Bugs for IBM Platform LSF Version Fixed Bugs for IBM LSF Version 9.1.1.1 Release Date: July 2013 The following bugs have been fixed in LSF Version 9.1.1.1 since March 2013 until June 24, 2013: 173446 Date 2013-01-11 The full pending reason

More information

Using Platform Parallel. Version 5.1 April 2003 Comments to:

Using Platform Parallel. Version 5.1 April 2003 Comments to: Version 5.1 April 2003 Comments to: doc@platform.com Copyright We d like to hear from you Document redistribution policy Internal redistribution Trademarks 1994-2003 Platform Computing Corporation All

More information

Introduction Workshop 11th 12th November 2013

Introduction Workshop 11th 12th November 2013 Introduction Workshop 11th 12th November Lecture II: Access and Batchsystem Dr. Andreas Wolf Gruppenleiter Hochleistungsrechnen Hochschulrechenzentrum Overview Access and Requirements Software packages

More information

The following bugs have been fixed in LSF Version Service Pack 2 between 30 th May 2014 and 31 st January 2015:

The following bugs have been fixed in LSF Version Service Pack 2 between 30 th May 2014 and 31 st January 2015: The following bugs have been fixed in LSF Version 9.1.3 Service Pack 2 between 30 th May 2014 and 31 st January 2015: 211873 Date 2013-7-19 1. When advance reservation files exist (lsb.rsv.id, lsb.rsv.stat),

More information

IBM Spectrum LSF Version 10 Release 1. Release Notes IBM

IBM Spectrum LSF Version 10 Release 1. Release Notes IBM IBM Spectrum LSF Version 10 Release 1 Release Notes IBM IBM Spectrum LSF Version 10 Release 1 Release Notes IBM Note Before using this information and the product it supports, read the information in

More information

Fixed Bugs for IBM Platform LSF Version 9.1.3

Fixed Bugs for IBM Platform LSF Version 9.1.3 Fixed Bugs for IBM LSF Version 9.1.3 Release Date: July 31 2014 The following bugs have been fixed in LSF Version 9.1.3 between 8 October 2013 and 21 July 2014: 223287 Date 2013-12-06 The preemption calculation

More information

Minerva Scientific Computing Environment. An Introduction to Minerva

Minerva Scientific Computing Environment. An Introduction to Minerva Minerva Scientific Computing Environment An Introduction to Minerva https://hpc.mssm.edu Outline Logging in and Storage Layout Modules: Software Environment Management LSF: Load Sharing Facility, i.e,

More information

Platform LSF Security. Platform LSF Version 7.0 Update 5 Release date: March 2009 Last modified: March 16, 2009

Platform LSF Security. Platform LSF Version 7.0 Update 5 Release date: March 2009 Last modified: March 16, 2009 Platform LSF Security Platform LSF Version 7.0 Update 5 Release date: March 2009 Last modified: March 16, 2009 Copyright 1994-2009 Platform Computing Inc. Although the information in this document has

More information

Fixed Bugs for IBM Platform LSF Version Fix Pack 3

Fixed Bugs for IBM Platform LSF Version Fix Pack 3 Fixed Bugs for IBM Platform LSF Version 9.1.3 Fix Pack 3 The following bugs have been fixed in LSF Version 9.1.3 Fix Pack 3 between 30 th May 2014 and 8 th June 2015: 235889 P100478 Date 2014-06-04 When

More information

Using Docker in High Performance Computing in OpenPOWER Environment

Using Docker in High Performance Computing in OpenPOWER Environment Using Docker in High Performance Computing in OpenPOWER Environment Zhaohui Ding, Senior Product Architect Sam Sanjabi, Advisory Software Engineer IBM Platform Computing #OpenPOWERSummit Join the conversation

More information

LSF Reference Guide. Version June Platform Computing Corporation

LSF Reference Guide. Version June Platform Computing Corporation LSF Reference Guide Version 4.0.1 June 2000 Platform Computing Corporation Copyright First Edition June 2000 Copyright 1994-2000 Platform Computing Corporation All rights reserved. Printed in Canada Although

More information

Release Notes for Platform LSF Version 7 Update 2

Release Notes for Platform LSF Version 7 Update 2 Release Notes for Platform LSF Version 7 Update 2 Contents Upgrade and Compatibility Notes on page 2 Release date: November 2007 Last modified: February 20, 2008 Comments to: doc@platform.com Support:

More information

Platform LSF Version 9 Release 1.1. Release Notes GI

Platform LSF Version 9 Release 1.1. Release Notes GI Platform LSF Version 9 Release 1.1 Release Notes GI13-3413-01 Platform LSF Version 9 Release 1.1 Release Notes GI13-3413-01 Note Before using this information and the product it supports, read the information

More information

Installation Instructions for Platform Suite for SAS Version 7.1 for UNIX

Installation Instructions for Platform Suite for SAS Version 7.1 for UNIX Installation Instructions for Platform Suite for SAS Version 7.1 for UNIX Copyright Notice The correct bibliographic citation for this manual is as follows: SAS Institute Inc., Installation Instructions

More information

Installation Instructions for Platform Suite for SAS Version 4.1 for UNIX

Installation Instructions for Platform Suite for SAS Version 4.1 for UNIX Installation Instructions for Platform Suite for SAS Version 4.1 for UNIX Copyright Notice The correct bibliographic citation for this manual is as follows: SAS Institute Inc., Installation Instructions

More information

Platform LSF Version 9 Release 1.1. Foundations SC

Platform LSF Version 9 Release 1.1. Foundations SC Platform LSF Version 9 Release 1.1 Foundations SC27-5304-01 Platform LSF Version 9 Release 1.1 Foundations SC27-5304-01 Note Before using this information and the product it supports, read the information

More information

UMass High Performance Computing Center

UMass High Performance Computing Center .. UMass High Performance Computing Center University of Massachusetts Medical School October, 2015 2 / 39. Challenges of Genomic Data It is getting easier and cheaper to produce bigger genomic data every

More information

Management of batch at CERN

Management of batch at CERN Management of batch at CERN What is this talk about? LSF as a product basic commands user perspective basic commands admin perspective CERN installation Unix users/groups and LSF groups share management

More information

Jenkins and Load Sharing Facility (LSF) Enables Rapid Delivery of Device Driver Software. Brian Vandegriend. Jenkins World.

Jenkins and Load Sharing Facility (LSF) Enables Rapid Delivery of Device Driver Software. Brian Vandegriend. Jenkins World. Jenkins and Load Sharing Facility (LSF) Enables Rapid Delivery of Device Driver Software Brian Vandegriend Jenkins and Load Sharing Facility (LSF) Enables Rapid Delivery of Device Driver Software Brian

More information

Installation Instructions for Platform Suite for SAS Version 10.1 for UNIX

Installation Instructions for Platform Suite for SAS Version 10.1 for UNIX M. Installation Instructions for Platform Suite for SAS Version 10.1 for UNIX The correct bibliographic citation for this manual is as follows: SAS Institute Inc. 2017. Installation Instructions for Platform

More information

Platform LSF Version 9 Release 1.3. Foundations SC

Platform LSF Version 9 Release 1.3. Foundations SC Platform LSF Version 9 Release 1.3 Foundations SC27-5304-03 Platform LSF Version 9 Release 1.3 Foundations SC27-5304-03 Note Before using this information and the product it supports, read the information

More information

/6)3URJUDPPHUV*XLGH. Version 3.2 Fourth Edition, August ODWIRUP&RPSXWLQJ&RUSRUDWLRQ

/6)3URJUDPPHUV*XLGH. Version 3.2 Fourth Edition, August ODWIRUP&RPSXWLQJ&RUSRUDWLRQ /6)3URJUDPPHUV*XLGH Version 3.2 Fourth Edition, August 1998 3ODWIRUP&RPSXWLQJ&RUSRUDWLRQ /6)3URJUDPPHU V*XLGH Copyright 1994-1998 Platform Computing Corporation All rights reserved. This document is copyrighted.

More information

Using Platform LSF MultiCluster. Version 6.1 November 2004 Comments to:

Using Platform LSF MultiCluster. Version 6.1 November 2004 Comments to: Using Platform LSF MultiCluster Version 6.1 November 2004 Comments to: doc@platform.com Copyright We d like to hear from you Document redistribution policy Internal redistribution Trademarks 1994-2004

More information

Platform LSF Version 9 Release 1.2. Security SC

Platform LSF Version 9 Release 1.2. Security SC Platform LSF Version 9 Release 1.2 Security SC27-5303-02 Platform LSF Version 9 Release 1.2 Security SC27-5303-02 Note Before using this information and the product it supports, read the information in

More information

SmartSuspend. Achieve 100% Cluster Utilization. Technical Overview

SmartSuspend. Achieve 100% Cluster Utilization. Technical Overview SmartSuspend Achieve 100% Cluster Utilization Technical Overview 2011 Jaryba, Inc. SmartSuspend TM Technical Overview 1 Table of Contents 1.0 SmartSuspend Overview 3 2.0 How SmartSuspend Works 3 3.0 Job

More information

Introduction to High-Performance Computing (HPC)

Introduction to High-Performance Computing (HPC) Introduction to High-Performance Computing (HPC) Computer components CPU : Central Processing Unit CPU cores : individual processing units within a Storage : Disk drives HDD : Hard Disk Drive SSD : Solid

More information

SAS Certified Deployment and Implementation Specialist for SAS Grid Manager 9.4

SAS Certified Deployment and Implementation Specialist for SAS Grid Manager 9.4 Sample Questions The following sample questions are not inclusive and do not necessarily represent all of the types of questions that comprise the exams. The questions are not designed to assess an individual's

More information

Upgrading Platform LSF on UNIX and Linux

Upgrading Platform LSF on UNIX and Linux Upgrading Platform LSF on UNIX and Linux Contents Upgrade your LSF Cluster on page 2 Compatibility Notes on page 4 Get Technical Support on page 15 Version 6.2 February 2 2006 Platform Computing Comments

More information

High-availability services in enterprise environment with SAS Grid Manager

High-availability services in enterprise environment with SAS Grid Manager ABSTRACT Paper 1726-2018 High-availability services in enterprise environment with SAS Grid Manager Andrey Turlov, Allianz Technology SE; Nikolaus Hartung, SAS Many organizations, nowadays, rely on services

More information

EIC system user manual

EIC system user manual EIC system user manual how to use system Feb 28 th 2013 SGI Japan Ltd. Index EIC system overview File system, Network User environment job script Submitting job Displaying status of job Canceling,deleting

More information

The RWTH Compute Cluster Environment

The RWTH Compute Cluster Environment The RWTH Compute Cluster Environment Tim Cramer 29.07.2013 Source: D. Both, Bull GmbH Rechen- und Kommunikationszentrum (RZ) The RWTH Compute Cluster (1/2) The Cluster provides ~300 TFlop/s No. 32 in TOP500

More information

Troubleshooting your SAS Grid Environment Jason Hawkins, Amadeus Software, UK

Troubleshooting your SAS Grid Environment Jason Hawkins, Amadeus Software, UK ABSTRACT A SAS Grid environment provides a highly available and resilient environment for your business. The challenge is that the more complex these environments become, the harder it can be to troubleshoot

More information

Using SHARCNET: An Introduction

Using SHARCNET: An Introduction Using SHARCNET: An Introduction High Performance Technical Computing Agenda: What is SHARCNET The HPC facilities People and support Getting an account SHARCNET essentials Working with the scheduler LSF

More information

Unix Processes. What is a Process?

Unix Processes. What is a Process? Unix Processes Process -- program in execution shell spawns a process for each command and terminates it when the command completes Many processes all multiplexed to a single processor (or a small number

More information

Microsoft SQL Server Fix Pack 15. Reference IBM

Microsoft SQL Server Fix Pack 15. Reference IBM Microsoft SQL Server 6.3.1 Fix Pack 15 Reference IBM Microsoft SQL Server 6.3.1 Fix Pack 15 Reference IBM Note Before using this information and the product it supports, read the information in Notices

More information

OVERVIEW OF THE SAS GRID

OVERVIEW OF THE SAS GRID OVERVIEW OF THE SAS GRID Host Caroline Scottow Presenter Peter Hobart MANAGING THE WEBINAR In Listen Mode Control bar opened with the white arrow in the orange box Copyr i g ht 2012, SAS Ins titut e Inc.

More information

Linux System Administration

Linux System Administration System Processes Objective At the conclusion of this module, the student will be able to: Describe and define a process Identify a process ID, the parent process and the child process Learn the PID for

More information

User Guide of High Performance Computing Cluster in School of Physics

User Guide of High Performance Computing Cluster in School of Physics User Guide of High Performance Computing Cluster in School of Physics Prepared by Sue Yang (xue.yang@sydney.edu.au) This document aims at helping users to quickly log into the cluster, set up the software

More information

Job Management System Extension To Support SLAAC-1V Reconfigurable Hardware

Job Management System Extension To Support SLAAC-1V Reconfigurable Hardware Job Management System Extension To Support SLAAC-1V Reconfigurable Hardware Mohamed Taher 1, Kris Gaj 2, Tarek El-Ghazawi 1, and Nikitas Alexandridis 1 1 The George Washington University 2 George Mason

More information

Introduction to NCAR HPC. 25 May 2017 Consulting Services Group Brian Vanderwende

Introduction to NCAR HPC. 25 May 2017 Consulting Services Group Brian Vanderwende Introduction to NCAR HPC 25 May 2017 Consulting Services Group Brian Vanderwende Topics we will cover Technical overview of our HPC systems The NCAR computing environment Accessing software on Cheyenne

More information

CRUK cluster practical sessions (SLURM) Part I processes & scripts

CRUK cluster practical sessions (SLURM) Part I processes & scripts CRUK cluster practical sessions (SLURM) Part I processes & scripts login Log in to the head node, clust1-headnode, using ssh and your usual user name & password. SSH Secure Shell 3.2.9 (Build 283) Copyright

More information

Platform Analytics Version for LSF. Administering SC

Platform Analytics Version for LSF. Administering SC Platform Analytics Version 9.1.2 for LSF Administering SC14-7572-01 Platform Analytics Version 9.1.2 for LSF Administering SC14-7572-01 Note Before using this information and the product it supports,

More information

Submitting batch jobs

Submitting batch jobs Submitting batch jobs SLURM on ECGATE Xavi Abellan Xavier.Abellan@ecmwf.int ECMWF February 20, 2017 Outline Interactive mode versus Batch mode Overview of the Slurm batch system on ecgate Batch basic concepts

More information

Shells. A shell is a command line interpreter that is the interface between the user and the OS. The shell:

Shells. A shell is a command line interpreter that is the interface between the user and the OS. The shell: Shells A shell is a command line interpreter that is the interface between the user and the OS. The shell: analyzes each command determines what actions are to be performed performs the actions Example:

More information

Exercise: Calling LAPACK

Exercise: Calling LAPACK Exercise: Calling LAPACK In this exercise, we ll use the same conventions and commands as in the batch computing exercise. You should refer back to the batch computing exercise description for detai on

More information

High Performance Computing How-To Joseph Paul Cohen

High Performance Computing How-To Joseph Paul Cohen High Performance Computing How-To Joseph Paul Cohen This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. Abstract This talk discusses how HPC is used and how

More information

COMP Superscalar. COMPSs at BSC. Supercomputers Manual

COMP Superscalar. COMPSs at BSC. Supercomputers Manual COMP Superscalar COMPSs at BSC Supercomputers Manual Version: 2.4 November 9, 2018 This manual only provides information about the COMPSs usage at MareNostrum. Specifically, it details the available COMPSs

More information

Virtuoso Analog Distributed Processing Option User Guide. Product Version September 2008

Virtuoso Analog Distributed Processing Option User Guide. Product Version September 2008 Virtuoso Analog Distributed Processing Option User Guide Product Version 6.1.3 September 2008 1999 2008 Cadence Design Systems, Inc. All rights reserved. Printed in the United States of America. Cadence

More information

Platform RTM User Guide. Platform RTM Version 2.0 Release date: March 2009

Platform RTM User Guide. Platform RTM Version 2.0 Release date: March 2009 Platform RTM User Guide Platform RTM Version 2.0 Release date: March 2009 Copyright 1994-2009 Platform Computing Inc. Although the information in this document has been carefully reviewed, Platform Computing

More information

Appendix A GLOSSARY. SYS-ED/ Computer Education Techniques, Inc.

Appendix A GLOSSARY. SYS-ED/ Computer Education Techniques, Inc. Appendix A GLOSSARY SYS-ED/ Computer Education Techniques, Inc. $# Number of arguments passed to a script. $@ Holds the arguments; unlike $* it has the capability for separating the arguments. $* Holds

More information

Using the computational resources at the GACRC

Using the computational resources at the GACRC An introduction to zcluster Georgia Advanced Computing Resource Center (GACRC) University of Georgia Dr. Landau s PHYS4601/6601 course - Spring 2017 What is GACRC? Georgia Advanced Computing Resource Center

More information

Introduction to High-Performance Computing (HPC)

Introduction to High-Performance Computing (HPC) Introduction to High-Performance Computing (HPC) Computer components CPU : Central Processing Unit cores : individual processing units within a CPU Storage : Disk drives HDD : Hard Disk Drive SSD : Solid

More information

Release Notes for Platform LSF. Platform LSF Version 7.0 Update 6 Release date: September 2009 Last modified: September 1, 2009

Release Notes for Platform LSF. Platform LSF Version 7.0 Update 6 Release date: September 2009 Last modified: September 1, 2009 Platform LSF Version 7.0 Update 6 Release date: September 2009 Last modified: September 1, 2009 Contents Release Notes for Platform LSF... 3 Upgrade and Compatibility Notes... 3 What s Changed in Platform

More information

CycleServer Grid Engine Support Install Guide. version

CycleServer Grid Engine Support Install Guide. version CycleServer Grid Engine Support Install Guide version 1.34.4 Contents CycleServer Grid Engine Guide 1 Administration 1 Requirements 1 Installation 1 Monitoring Additional Grid Engine Clusters 3 Monitoring

More information

Logging in to the CRAY

Logging in to the CRAY Logging in to the CRAY 1. Open Terminal Cray Hostname: cray2.colostate.edu Cray IP address: 129.82.103.183 On a Mac 2. type ssh username@cray2.colostate.edu where username is your account name 3. enter

More information

Scheduling in SAS 9.2

Scheduling in SAS 9.2 Scheduling in SAS 9.2 The correct bibliographic citation for this manual is as follows: SAS Institute Inc. 2009. Scheduling in SAS 9.2. Cary, NC: SAS Institute Inc. Scheduling in SAS 9.2 Copyright 2009,

More information

Using the Yale HPC Clusters

Using the Yale HPC Clusters Using the Yale HPC Clusters Robert Bjornson Yale Center for Research Computing Yale University Feb 2017 What is the Yale Center for Research Computing? Independent center under the Provost s office Created

More information

CAM Tutorial. configure, build & run. Dani Coleman July

CAM Tutorial. configure, build & run. Dani Coleman July CAM Tutorial configure, build & run Dani Coleman bundy@ucar.edu July 27 2009 CAM is a subset of CCSM Atmosphere Data Ocean Land Data Sea Ice Documentation of CAM Scientific description: http://www.ccsm.ucar.edu/models/atm-cam/docs/description/

More information

Command-line interpreters

Command-line interpreters Command-line interpreters shell Wiki: A command-line interface (CLI) is a means of interaction with a computer program where the user (or client) issues commands to the program in the form of successive

More information

5/20/2007. Touring Essential Programs

5/20/2007. Touring Essential Programs Touring Essential Programs Employing fundamental utilities. Managing input and output. Using special characters in the command-line. Managing user environment. Surveying elements of a functioning system.

More information

Using the Yale HPC Clusters

Using the Yale HPC Clusters Using the Yale HPC Clusters Stephen Weston Robert Bjornson Yale Center for Research Computing Yale University Oct 2015 To get help Send an email to: hpc@yale.edu Read documentation at: http://research.computing.yale.edu/hpc-support

More information

(MCQZ-CS604 Operating Systems)

(MCQZ-CS604 Operating Systems) command to resume the execution of a suspended job in the foreground fg (Page 68) bg jobs kill commands in Linux is used to copy file is cp (Page 30) mv mkdir The process id returned to the child process

More information

Running the model in production mode: using the queue.

Running the model in production mode: using the queue. Running the model in production mode: using the queue. 1) Codes are executed with run scripts. These are shell script text files that set up the individual runs and execute the code. The scripts will seem

More information

Using Platform LSF HPC

Using Platform LSF HPC Using Platform LSF HPC Version 7 Update 5 Release date: March 2009 Last modified: March 13, 2009 Support: support@platform.com Comments to: doc@platform.com Copyright We d like to hear from you 1994-2009,

More information

EECS2301. Lab 1 Winter 2016

EECS2301. Lab 1 Winter 2016 EECS2301 Lab 1 Winter 2016 Lab Objectives In this lab, you will be introduced to the Linux operating system. The basic commands will be presented in this lab. By the end of you alb, you will be asked to

More information

HTCondor Essentials. Index

HTCondor Essentials. Index HTCondor Essentials 31.10.2017 Index Login How to submit a job in the HTCondor pool Why the -name option? Submitting a job Checking status of submitted jobs Getting id and other info about a job

More information

QuickStart: Deploying Platform Symphony 3.1 on HP Clusters

QuickStart: Deploying Platform Symphony 3.1 on HP Clusters QuickStart: Deploying Platform Symphony 3.1 on HP Clusters 1 Overview... 2 2 Platform Symphony: Quick Overview... 2 2.1 Platform Symphony... 3 2.2 Platform EGO... 3 2.3 Platform Management Console... 3

More information

Monitoring Agent for Unix OS Version Reference IBM

Monitoring Agent for Unix OS Version Reference IBM Monitoring Agent for Unix OS Version 6.3.5 Reference IBM Monitoring Agent for Unix OS Version 6.3.5 Reference IBM Note Before using this information and the product it supports, read the information in

More information

Grid Computing in SAS 9.4

Grid Computing in SAS 9.4 Grid Computing in SAS 9.4 SAS Documentation The correct bibliographic citation for this manual is as follows: SAS Institute Inc. 2013. Grid Computing in SAS 9.4. Cary, NC: SAS Institute Inc. Grid Computing

More information

Migrate Platform LSF to Version 7 on Windows. Platform LSF Version 7.0 Update 6 Release date: August 2009 Last modified: August 17, 2009

Migrate Platform LSF to Version 7 on Windows. Platform LSF Version 7.0 Update 6 Release date: August 2009 Last modified: August 17, 2009 Migrate Platform LSF to Version 7 on Windows Platform LSF Version 7.0 Update 6 Release date: August 2009 Last modified: August 17, 2009 Copyright 1994-2009 Platform Computing Inc. Although the information

More information

Release Notes for Platform Process Manager. Platform Process Manager Version 8.2 May 2012

Release Notes for Platform Process Manager. Platform Process Manager Version 8.2 May 2012 Release Notes for Platform Process Manager Platform Process Manager Version 8.2 May 2012 Copyright 1994-2012 Platform Computing Corporation. Although the information in this document has been carefully

More information

Mid Term from Feb-2005 to Nov 2012 CS604- Operating System

Mid Term from Feb-2005 to Nov 2012 CS604- Operating System Mid Term from Feb-2005 to Nov 2012 CS604- Operating System Latest Solved from Mid term Papers Resource Person Hina 1-The problem with priority scheduling algorithm is. Deadlock Starvation (Page# 84) Aging

More information

Configuring the Oracle Network Environment. Copyright 2009, Oracle. All rights reserved.

Configuring the Oracle Network Environment. Copyright 2009, Oracle. All rights reserved. Configuring the Oracle Network Environment Objectives After completing this lesson, you should be able to: Use Enterprise Manager to: Create additional listeners Create Oracle Net Service aliases Configure

More information

Using Platform LSF HPC Features

Using Platform LSF HPC Features Using Platform LSF HPC Features Version 8 Release date: January 2011 Last modified: January 10, 2011 Support: support@platform.com Comments to: doc@platform.com Copyright We d like to hear from you 1994-2011,

More information

unix intro Documentation

unix intro Documentation unix intro Documentation Release 1 Scott Wales February 21, 2013 CONTENTS 1 Logging On 2 1.1 Users & Groups............................................. 2 1.2 Getting Help...............................................

More information

OpenPBS Users Manual

OpenPBS Users Manual How to Write a PBS Batch Script OpenPBS Users Manual PBS scripts are rather simple. An MPI example for user your-user-name: Example: MPI Code PBS -N a_name_for_my_parallel_job PBS -l nodes=7,walltime=1:00:00

More information

Intel Manycore Testing Lab (MTL) - Linux Getting Started Guide

Intel Manycore Testing Lab (MTL) - Linux Getting Started Guide Intel Manycore Testing Lab (MTL) - Linux Getting Started Guide Introduction What are the intended uses of the MTL? The MTL is prioritized for supporting the Intel Academic Community for the testing, validation

More information

An Introduction to Cluster Computing Using Newton

An Introduction to Cluster Computing Using Newton An Introduction to Cluster Computing Using Newton Jason Harris and Dylan Storey March 25th, 2014 Jason Harris and Dylan Storey Introduction to Cluster Computing March 25th, 2014 1 / 26 Workshop design.

More information

LSF Make. Platform Computing Corporation

LSF Make. Platform Computing Corporation LSF Make Overview LSF Make is only supported on UNIX. LSF Batch is a prerequisite for LSF Make. The LSF Make product is sold, licensed, distributed, and installed separately. For more information, contact

More information

Course Syllabus. Operating Systems

Course Syllabus. Operating Systems Course Syllabus. Introduction - History; Views; Concepts; Structure 2. Process Management - Processes; State + Resources; Threads; Unix implementation of Processes 3. Scheduling Paradigms; Unix; Modeling

More information

A Distributed Load Sharing Batch System. Jingwen Wang, Songnian Zhou, Khalid Ahmed, and Weihong Long. Technical Report CSRI-286.

A Distributed Load Sharing Batch System. Jingwen Wang, Songnian Zhou, Khalid Ahmed, and Weihong Long. Technical Report CSRI-286. LSBATCH: A Distributed Load Sharing Batch System Jingwen Wang, Songnian Zhou, Khalid Ahmed, and Weihong Long Technical Report CSRI-286 April 1993 Computer Systems Research Institute University of Toronto

More information

Using ISMLL Cluster. Tutorial Lec 5. Mohsan Jameel, Information Systems and Machine Learning Lab, University of Hildesheim

Using ISMLL Cluster. Tutorial Lec 5. Mohsan Jameel, Information Systems and Machine Learning Lab, University of Hildesheim Using ISMLL Cluster Tutorial Lec 5 1 Agenda Hardware Useful command Submitting job 2 Computing Cluster http://www.admin-magazine.com/hpc/articles/building-an-hpc-cluster Any problem or query regarding

More information

IBM Platform LSF. Best Practices. IBM Platform LSF and IBM GPFS in Large Clusters. Jin Ma Platform LSF Developer IBM Canada

IBM Platform LSF. Best Practices. IBM Platform LSF and IBM GPFS in Large Clusters. Jin Ma Platform LSF Developer IBM Canada IBM Platform LSF Best Practices IBM Platform LSF 9.1.3 and IBM GPFS in Large Clusters Jin Ma Platform LSF Developer IBM Canada Table of Contents IBM Platform LSF 9.1.3 and IBM GPFS in Large Clusters...

More information

Release Notes for Platform Process Manager. Platform Process Manager Version 8.1 January 2011 Last modified: January 2011

Release Notes for Platform Process Manager. Platform Process Manager Version 8.1 January 2011 Last modified: January 2011 Release Notes for Platform Process Manager Platform Process Manager Version 8.1 January 2011 Last modified: January 2011 Copyright 1994-2011 Platform Computing Corporation. Although the information in

More information