The following bugs have been fixed in LSF Version Service Pack 2 between 30 th May 2014 and 31 st January 2015:

Size: px
Start display at page:

Download "The following bugs have been fixed in LSF Version Service Pack 2 between 30 th May 2014 and 31 st January 2015:"

Transcription

1 The following bugs have been fixed in LSF Version Service Pack 2 between 30 th May 2014 and 31 st January 2015: Date When advance reservation files exist (lsb.rsv.id, lsb.rsv.stat), messages, LSF should log at the ERROR level if mbatchd fails to open the files for reading 2. When advance reservation files do not exist, LSF should log messages at the LOG_INFO level. 3. mbatchd should always use the LSF primary administrator account to access these files under LSB_SHAREDIR. Currently, when mbatchd starts up, mbatchd uses root to read advance reservation files. User does not know when the advance reservation file is not accessible P Date If sbatchd is not responding or is unavailable when mbatchd attempts to send modification information to it, sbatchd will never receive the modification information. sbatchd Using bmod to change a running job s run time limit does not take effect when sbatchd is unavailable P Date There are problems with guarantee and preemption when high priority, guaranteed consumer jobs remain pending even when resource requirements are met. Component mbschd schmod_default.so Some pending jobs cannot run even when resources meet the requirements P Date When checkpoint jobs are submitted by script, brestart -W does not work and restarted jobs cannot be terminated by RUNLIMIT.

2 Component sbatchd erestart After running brestart, the job cannot exit and keeps to show "Checkpoint initiated" and "Checkpoint succeeded" in sequence P Date When executing a job containing multiple tasks, the task RES calculates an incorrect XDR size and causes an XDR encoding error. Component res Jobs that are launched by blaunch fail to execute P Date In MultiCluster lease mode, a buffer overflow occurs when the lease.state.file is too large, causing an mbatchd core dump. mbatchd core dumps and LSF no longer works P Date When mbschd encounters an error on a job, other jobs do not get scheduled and remain pending. Component mbschd Jobs are pending until an administrator runs badmin reconfig P Date In MultiCluster forward mode, a parallel job submitted with the same[] RES_REQ is blocked. Component mbschd schmod_mc.so Jobs remain pending indefinitely at submission cluster.

3 P Date When the lsb_submit() API is used to submit jobs for both a parent and child process, jobs submitted for the child process do not process LSB_SUB_MODIFY_FILE and LSB_SUB_MODIFY_ENVFILE properly. Component liblsf.a liblsb.so libbat.a libbat.so lsbatch.h lsf.h esub does not work together with the lsb_submit API P Date When shared resources are configured for the cluster, vemkd reports a warning message: "lsfinit: resource <resource_name> is being used by multiple hosts. It cannot be used in a resource requirement expression." Component vemkd Several error messages are logged in the vemkd log file even with the correct configuration P Date When preemption and guaranteed SLA are both enabled, mbschd will take a long time to finish one scheduling cycle, especially when there are tens of thousands of pending jobs. Component mbschd mbschd performance issue causes low job throughput in the LSF cluster P Date When using bsub to submit a job, a core dump occurs if the specified command and its arguments contain multiple quotations. Component bsub Cannot submit jobs if the command and its arguments contain multiple quotations.

4 P Date After enabling LSB_QUERY_ENH in lsf.conf, if there are many bhosts requests in the cluster and there is an affinity host in the cluster, or if affinity is enabled in the cluster, the query child mbatchd will core dump repeatedly. The core dump is caused by a "thread unsafe function" Child mbatchd core dumps, which causes b* query commands to fail P Date When using the command brestart with Gold Integration jobs, the command fails due to some missing job information such as a project name or job ID. Component brestart Difficult to integrate GOLD with job restarted with brestart P Date When running a long pre-execution job, and the job is killed before the finishing the pre-execution script, running eexec cannot get the environment variable LSF_JOB_EXECUSER. Component sbatchd Several GOLD reservations are not released if gcharge fails and the job is killed P Date When LSF_TMPDIR is set to a shared file system, esub sometimes does not work because the temp file for one job is overridden by another job. Component bsub Wrong or missing job submission options set in esub P Date

5 When a license error occurs ("Unable to contact LIM"), mbatchd exits The error message is confusing P Date This fix introduces five performance enhancements for the LSF scheduler. You must individually enable each of the following enhancements with a parameter in lsf.conf: 1. LSB_SHARED_RSRC_ENH=Y (lsf.conf) LSF allows you to configure multiple instances of a (site-defined) shared resource. For example, for a shared resource "R", there may be one instance consisting of 10 units of R that is available on hosts 1 and 2, and a second instance consisting of 10 units of R that is available on hosts 3 and 4. Each host can be associated with up to one instance. If a job specifies a shared resource in its rusage string and LSF discovers that the job cannot use one host because of a lack of the resource, other hosts are also checked since there may be multiple instances of the resource. In the special case of a single resource instance for the cluster (for example, representing a floating software license), LSF would ideally not consider any other hosts for the job. When you set LSB_SHARED_RSRC_ENH=Y, after LSF finds that an insufficient amount of a single-instance shared resource is available on one host, LSF will not consider other hosts for the job. 2. LSB_SKIP_FULL_HOSTS=Y (lsf.conf) LSF removes unusable hosts from consideration at the beginning of each scheduling session. For example, hosts that are down (unavail or unreach), closed by the administrator (closed_adm), or closed due to a load threshold (closed_busy), are unusable by any job and can be removed from consideration. Removing these hosts early in the scheduling session helps with performance. Hosts with all slots occupied (closed_full) are not removed, since these hosts can still be used by jobs in preemptive queues, if queue-based preemption is enabled. For sites without preemption configured, it is not necessary for LSF to consider full hosts. When you set LSB_SKIP_FULL_HOSTS=Y, LSF removes full hosts from consideration at the beginning of each scheduling session, as long as either the preemption plug-in is not loaded, or there is no preemption relationship between queues (for more details, see the PREEMPTION parameter in lsb.queues).

6 Component mbatchd mbschd schmod_default.so schmod_reserve.so schmod_preemption.so schmod_affinity.so schmod_parallel.so schmod_advrsv.so schmod_aps.so schmod_bluegene.so schmod_cpuset.so schmod_craylinux.so schmod_crayx1.so schmod_dc.so schmod_dist.so schmod_fairshare.so schmod_fcfs.so schmod_jobweight.so schmod_limit.so schmod_mc.so schmod_ps.so schmod_pset.so schmod_rms.so schmod_xl.so Lower mbschd performance has a large impact on job dispatching rate in a cluster P Date When a parallel job finishes, LSF reports an incorrect MAXMEM value. Component res LSF report a larger memory usage, which has impact on the user s analytical tools P Date In a mixed cluster environment, using a bpeek command on a job that is running on another host occasionally fails. Component bpeek bpeek occasionally does not work P Date Cannot backfill the reserved job when the job is an exclusive job. Component mbschd schmod_reserve.so Short jobs cannot use backfill slots P Date

7 Compute unit resource requirements cannot work with leased-in hosts. 1. This fix allows you to specify leased-in hosts in the definition of a compute unit For example Begin ComputeUnit NAME MEMBER TYPE en1 (host1@mc1) (enclosure) en2 (ho*@mc1) (enclosure) End ComputeUnit Note: You must define a valid host name of a leased-in host in the MEMBER column. The badmin reconfig command does not log an error or warning message if you specify an invalid host name. mbatchd only logs the error or warning message in the mbatchd log file after mbatchd gets the leased-in host information from the remote cluster. 2. This fix allows dynamic hosts and leased-in hosts join in a compute unit (after you run badmin reconfig to apply the changes). This allows jobs with a compute unit resource requirement to be dispatched to the new dynamic hosts and leased-in hosts. mbschd Users cannot specify compute unit resource requirements to use leased-in hosts P Date The run time value in the lsb.stream file is incorrect when brequeue is used to requeue a job to pending status, then the job is killed before resuming. Requeued jobs that are killed while pending are logged in the Platform Analytics database with a long run time P Date When there are no more records, lsb_readjobinfo() returns the error code 53 instead of 47.

8 Component liblsf.a liblsf.so libbat.a libbat.so lsf.h lsbatch.h LSF API client code fails to detect the case when there are no more job records in mbatchd P Date After running lsrun on some hosts, the lsload and lsload -E commands display Component lim incorrect r15s and r1m values for that host. lsload, lsload E, and lsload -N commands display incorrect results after running lsrun on some hosts P Date If using both compute unit and affinity, some jobs may cause an mbschd core dump. Component mbschd schmod_parallel.so schmod_reserve.so mbschd core dumps regularly, which causes a low job dispatch rate P Date CPU binding does not work with LSF_BIND_JOB after LSF is started using the Component sbatchd 'lsf_daemons start' command. Jobs cannot be bound after running 'lsf_daemons start' P Date

9 This fix introduces five performance enhancements for the LSF scheduler. You must individually enable each of the following enhancements with a parameter in lsf.conf: 1. LSB_SHARED_RSRC_ENH=Y (lsf.conf) LSF allows you to configure multiple instances of a (site-defined) shared resource. For example, for a shared resource "R", there may be one instance consisting of 10 units of R that is available on hosts 1 and 2, and a second instance consisting of 10 units of R that is available on hosts 3 and 4. Each host can be associated with up to one instance. If a job specifies a shared resource in its rusage string and LSF discovers that the job cannot use one host because of a lack of the resource, other hosts are also checked since there may be multiple instances of the resource. In the special case of a single resource instance for the cluster (for example, representing a floating software license), LSF would ideally not consider any other hosts for the job. When you set LSB_SHARED_RSRC_ENH=Y, after LSF finds that an insufficient amount of a single-instance shared resource is available on one host, LSF will not consider other hosts for the job. 2. LSB_SKIP_FULL_HOSTS=Y (lsf.conf) LSF removes unusable hosts from consideration at the beginning of each scheduling session. For example, hosts that are down (unavail or unreach), closed by the administrator (closed_adm), or closed due to a load threshold (closed_busy), are unusable by any job and can be removed from consideration. Removing these hosts early in the scheduling session helps with performance. Hosts with all slots occupied (closed_full) are not removed, since these hosts can still be used by jobs in preemptive queues, if queue-based preemption is enabled. For sites without preemption configured, it is not necessary for LSF to consider full hosts. When you set LSB_SKIP_FULL_HOSTS=Y, LSF removes full hosts from consideration at the beginning of each scheduling session, as long as either the preemption plug-in is not loaded, or there is no preemption relationship between queues (for more details, see the PREEMPTION parameter in lsb.queues). 3. LSB_DISABLE_PROJECT_LIMITS=Y (lsf.conf) Internally, LSF puts jobs with like attributes into "buckets" for scheduling efficiency. The idea is that if LSF cannot dispatch one job in a job bucket during a scheduling session, LSF can generally assume that the rest of the jobs in the same bucket cannot be dispatched either, and therefore do not need to be considered. In general, fewer job buckets leads to better scheduling performance. Use "badmin perfmon view" to see the current number of job buckets in your cluster. By default, LSF separates jobs with different projects (given by the bsub P option) into different buckets. The reason for this is to handle project-based limits. For example, a limit on project P1 of 25 slots. If you do not configure project limits, you can set LSB_DISABLE_PROJECT_LIMITS=Y to prevent LSF from separating jobs into buckets based on project name. When this is enabled, LSF will ignore configured project limits. 4. LSB_FAST_REQUEST_NEW_JOBS=Y (lsf.conf) This parameter reduces the time taken in communicating newly submitted jobs from mbatchd to mbschd. 5. LSB_SHARE_LOCATION_ENH=Y (lsf.conf) This parameter improves LSF performance by reducing the sizes of messages passed between mbatchd and mbschd. By default, in messages between these daemons, each instance of a shared resource is identified by the list of hosts corresponding to that instance. After you set LSB_SHARE_LOCATION_ENH=Y, each instance is assigned an integer ID that is used for communication. This enhancement is especially useful for sites with several configured shared resources.

10 mbschd Lower mbschd performance has a large impact on job dispatching rate in a cluster P Date Incorrect behavior when using bsub -I to submit a job containing "&&" in the Component bsub command line (for example, bsub -I echo "test&&mg"). Job command is changed by LSF when specifying && in the command line P Date A job with span[ptile=x] may cause an mbschd core dump in some cases. Component mbschd mbschd core dumps regularly, which causes a low job dispatch rate P Date When more than 1000 resources are defined in lsf.shared, lsadmin and badmin core dumps. Component lsadmin badmin Cannot use lsadmin and badmin to start up LSF daemons when more than 1000 resources are defined P Date When running badmin reconfig, mbatchd fails to receive the system user name or system group name. Therefore, when badmin reconfig is finished, jobs cannot be submitted by specifying a group name using -G due to an "Unknown user or user group" error.

11 sbatchd New jobs cannot be submitted after running badmin reconfig P Date Dynamic hosts should not change to "closed_inactive" status if it is not a Platform MultiCluster host. Dynamic host cannot be used after it is changed to "closed_inactive" P Date Guaranteed resources are not being held when there is a host in the closed_busy state. Component mbschd schmod_default.so Configured guarantees cannot be held P Date An issue with the LSF library can cause the nios process to enter a busy loop, which results in issues with MPI job performance. This issue is triggered when the stdin of blaunch is neither /dev/null nor FIFO. When running fluent under Platform MPI, the stdin of blaunch is redirected to a socket, which triggers this issue. Component nios Configured guarantee cannot be held P Date

12 Performance Application Programming Interface (PAPI) conflicts with the hardware counters collection functionality in LSF. When LSF_COLLECT_ENERGY_USAGE is set to "Y", energy jobs using PAPI cannot access hardware counters correctly. This fix updates the configuration syntax of LSF_COLLECT_ENERGY_USAGE as: LSF_COLLECT_ENERGY_USAGE=Y N ENERGY When set to "Y", LSF collects both energy consumption and hardware counters for jobs. The default value "N" disables those collections for jobs. When set to "ENERGY", LSF only collects energy consumption for jobs. Energy jobs using PAPI can run correctly with this option. User's application cannot get performance data while working with LSF P Date bjobs -l does not show the same effective resource requirement string if the Component mbschd originally-specified resource requirement string is longer than 512 bytes. Without showing the same string in effective resource requirement, end user may think the job is dispatched incorrectly P Date

13 Add a parameter to control the memory usage report when using cgroup. LSB_CGROUP_MEM_INCLUDE_CACHE Configured in lsf.conf Syntax LSB_CGROUP_MEM_INCLUDE_CACHE=Y N y n When set to "Y/y", LSF includes rss and cache in the memory usage report when cgroup is enabled. When set to "N/n", LSF only includes rss in the memory usage report when cgroup is enabled. Default Y. Component sbatchd res LSF reports that jobs are using more memory than the actual use, which causes the job to be unexpectedly killed P Date Due to a transient name resolution causing mbatchd/sbatchd communication issues, finished jobs are reported as running and cannot be killed with bkill. When the compute host s host name is incorrect, the master host cannot receive the job status. Therefore, the system keeps the job in a run status and it cannot be killed. This fix assumes that the host names configured for the LSF cluster are the same as the official names configured in DNS server or /etc/hosts. Both host names (LSF cluster and DNS server) may include the domain (or not), but they must match. mbatchd keeps jobs in running status and end users cannot use bkill to kill their jobs P Date

14 There is an inconsistent return value of a user job because whether the job script process is killed by the SIGXFSZ signal depends on if the job command redirects stdout. This fix introduces the following parameter: LSB_JOB_SCRIPT_TRAP_SIGNALS Configured in lsf.conf. Syntax LSB_JOB_SCRIPT_TRAP_SIGNALS=signal_name... A list of the names of signals that are trapped by the job scripts. This parameter prevents the specified signals from killing the job script process. By default, the job scripts trap the SIGTERM, SIGUSR1, SIGUSR2, SIGINT, and SIGHUP signals, so you do not have to define these signals in this parameter. Because the job scripts cannot trap the SIGSTOP and SIGKILL signals, these values are not valid. Valid values: A space-separated list of signal names. The first 31 signals are valid (from SIGHUP to SIGSYS), except for SIGSTOP and SIGKILL. This parameter is not supported on Windows platforms. Default Undefined. The job script does not trap any additional signals except SIGTERM, SIGUSR1, SIGUSR2, SIGINT, and SIGHUP. Component sbatchd mbatchd keeps jobs in running status and end users cannot use bkill to kill their jobs P Date If a Platform MPI job is terminated because a task on the first node ran over the memory limit and is killed by cgroup memory fencing, the bjobs/bhist/bacct command cannot display the job exit reason and finish resource usage properly. Component sbatchd Job accounting information is incorrect.

15 P Date bjobs/bhist sometimes shows the wrong signal number when the job reaches the run limit. Component bjobs bhist End users do not know the actual job exit reason P Date When configuring shared resources only on dynamic hosts, the master LIM core Component lim dumps if all the dynamic hosts are removed from the cluster. LIM core dumps and LSF no longer works P Date After a dynamic host with exclusive resources joins the cluster, the exclusive Component lim resources disappear from the dynamic host after re-configuring the LIM. After reconfiguring the LIM, the exclusive resources are lost from the dynamic host P Date The job run time recorded in the lsb.acct file is incorrect when the job is UNKNOWN and mbatchd is restarted. The incorrect job run time recorded in lsb.acct causes RTM to report incorrect job information P Date

16 When submitting jobs with memory requirements and specifying "span[hosts=1]", if there are no hosts in the cluster that can meet the memory requirement, LSF still makes a slot reservation for the job. Component mbschd schmod_default.so A high priority job reserves resources but cannot run on the host. This causes a waste of resources P Date When LSF kills a job that is part of a job dependency condition (that is, when LSF kills a job that other jobs depend on), mbatchd takes a long period of time to restart. mbatchd is busy evaluating job dependencies, which causes LSF to stop working P Date Job execution initialization fails if the execution host cannot resolve the submission host. This fix introduces the following parameter: LSB_DISABLE_SUB_HOST_LOOKUP Configured in lsf.conf Syntax LSB_DISABLE_SUB_HOST_LOOKUP=Y N Disable submission host name lookup when executing jobs. When this parameter is set, the job execution sbatchd does not look up the submission host name when executing or cleaning up the job. LSF will not be able to do any host-dependent automounting. Default N. LSF will look up the submission host name when executing jobs. Component sbatchd badmin

17 The job cannot run on some hosts P Date When resizing the terminal window of an interactive blaunch job, the job exits with Component res the SIGPROF signal. Parallel jobs are unexpectedly killed by LSF P Date The unit for the mem, swp, and tmp thresholds that lshosts displays is not changed Component lshosts after defining a different unit using LSF_UNIT_FOR_LIMITS in lsf.conf. The lshosts output is incorrect P Date When $LSF_ENVDIR is not set, elim.hpc does not check if /etc/lsf.conf exists. When $LSF_BINDIR is not set, elim.hpc has a security hole that can cause normal users to gain root permissions. Component elim.hpc elim.hpc does not work properly if $LSF_ENVDIR or $LSF_BINDIR are not set P Date If a blaunch job is submitted using 'bsub -i' with a large input file (larger than 8192 Component res bytes), the job hangs.

18 LSF jobs hang when specifying large input files P Date mbatchd may fail when any line of lsb.users is longer than 4352 characters. mbatchd core dumps and LSF no longer works P Date Job finish time includes the post-execution processing time, which impacts Platform RTM statistics. RTM does not report the correct job finish time P Date It takes more than 10 minutes to kill an array job with about 1000 elements. Component Mbatchd Array jobs cannot be killed in a short time, which causes slots to be wasted P Date Improve the job scheduling performance when there are several single-host parallel job buckets. Component mbschd schmod_parallel.so Lower mbschd performance has a large impact on job dispatch rate in a cluster.

19 P Date If a user group is defined as being updated with egroup using EGROUP_UPDATE_INTERVAL=1 defined in lsb.params and a new user is added to the user group, the user's MAX_JOBS value does not display the correct value. Component Mbatchd The wrong MAX_JOBS value applies to LSF users P Date When the status of hosts becomes UNAVAIL and if there is advanced reservation defined in the hosts, warning messages do not clarify whether there are more slots reserved than are available on the host, or if the problem is that the status became UNAVAIL. Component Mbatchd The warning message is misleading and users do not know how to avoid it P Date bhosts and busers might report incorrect reserved slots values when time-based slot reservation is enabled. Component mbschd schmod_reserve.so Command output is wrong which may confuse end users P Date If a job dependency condition is "ended(jobid)", the dependency is broken when the parent job is requeued. Job dependency is broken in some cases.

20 P Date A file descriptor limit that is set higher than is not respected by sbatchd and RES. Component sbatchd Jobs depending on a large number of open files fail to run P Date When $LSF_ENVDIR is not set, elim.hpc does not check if /etc/lsf.conf exists. When $LSF_BINDIR is not set, elim.hpc has a security hole that can cause normal users to gain root permissions. Component elim.hpc elim.hpc does not work properly if $LSF_ENVDIR or $LSF_BINDIR are not set P Date If a krb5 ticket renewal fails, there are not enough log messages to assist with troubleshooting. Component Krbrenewd Insufficient warning and error messages makes it more difficult to debug problems P Date After upgrading the LSF cluster to version 9.1.3, the process tracking information for jobs that were still unfinished before upgrading are lost and cannot be recovered. This is because LSF changes the cgroup information file name format, so the old cgroup information files are no longer recognized by LSF Component sbatchd

21 Cannot collect jobs run time usage information with cgroup enabled P Date mbschd performance might be slow when scheduling parallel jobs. Component mbschd schmod_default.so schmod_parallel.so schmod_reserve.so Lower mbschd performance has a large impact on the job dispatch rate in a cluster P Date When a newly-installed LSF cluster starts up, the master elim may report the following error message in the log file: readloadupdatefromsubelim: Protocol error: loadcnt cannot be read from elim This error message is a false alarm. The root cause is that some elims may start, but quickly exit with ELIM_ABORT_VALUE. A race condition might happen where the master elim read the exited child elim process before receiving the SIGCHLD signal of the child, in which case the read fails and the master elim displays this error message. Component Melim The error message gives LSF administrators concerns about LSF product quality P Date When submitting jobs with memory requirements ("rusage[mem=value]") and processor requirements ("span[ptile=value]"), there is an incorrect reservation of hosts that do not have enough memory. Component mbschd schmod_parallel.so schmod_reserve.so A high priority job reserves resources but cannot run on the host. This causes a waste in resources.

22 P Date When running SGI MPI jobs under pam, the CPU time report is incorrect. Component pam End users cannot get the right CPU time usage of their parallel jobs within pam P Date mbatchd does not accept user group names that end with a backslash ("/"). LSF administrators cannot configure user group names to be the same as user names P Date If the job has the span[ptile='!'] resource requirement, but the user who submitted the job did not define MXJ for any host type/model in lsb.hosts, and the user also did not specify a slot requirement for any host type/model in the span[] clause of job's submission command: - LSF or older versions ignore the span[ptile='!'] resource requirement and treat the job as an ordinary parallel job. - LSF does not ignore the span[ptile='!'] requirement but treats this clause as span[ptile=1]. This fix restores the previous LSF behavior for handling span[ptile='!'] resource requirements. Some jobs are pending even though there are enough resources P Date When defining "LSB_QUERY_ENH=Y" in lsf.conf and performing several queries, the query child mbatchd might core dump.

23 Component Mbatchd Child mbatchd core dumps, which causes b* query commands to fail P Date When the argument to blimits -u or -q is part of the actual user or queue, the actual user or queue will still be shown. This fix restricts the argument, and does not make any expansion. Component blimits blimits -u or -q show some limits information that it should not show P Date

24 Whether the job script process is killed by signal SIGXFSZ depends on if job command redirects stdout, leading to inconsistent return values of user jobs. This fix introduces the following parameter: LSB_JOB_SCRIPT_TRAP_SIGNALS Configured in lsf.conf. Syntax LSB_JOB_SCRIPT_TRAP_SIGNALS=signal_name... A list of the names of signals that are trapped by the job scripts. This parameter prevents the specified signals from killing the job script process. By default, the job scripts trap the SIGTERM, SIGUSR1, SIGUSR2, SIGINT, and SIGHUP signals, so you do not have to define these signals in this parameter. Because the job scripts cannot trap the SIGSTOP and SIGKILL signals, these values are not valid. Valid values: A space-separated list of signal names. The first 31 signals are valid (from SIGHUP to SIGSYS), except for SIGSTOP and SIGKILL. This parameter is not supported on Windows platform. Default Undefined. The job script does not trap any additional signals except SIGTERM, SIGUSR1, SIGUSR2, SIGINT, and SIGHUP. Component sbatchd mbatchd keeps jobs in a running status and end users cannot use bkill to kill their jobs P Date When running bpost on an execution cluster running LSF with a submission cluster running LSF cluster, if the submission cluster mbatchd lost connnection during this time, the execution cluster mbatchd core dumps after the submission cluster mbatchd reconnects. mbatchd core dumps and LSF no longer works.

25 P Date bmgroup takes a long time to show the new dynamic hosts, and it takes a long time (about 10 minutes) before the new dynamic hosts start accepting jobs. Component Mbatchd It takes a long time for users to know that a dynamic host is ready to use P Date When mbatchd replays events and there are events that modify an entire job array to run in a large host group, mbatchd replays slowly and takes a long time to restart the cluster. LSF mbatchd starts up slowly and does not respond P Date If running in a host partition configured with a host group, mbatchd might core dump. mbatchd core dumps and LSF no longer works. The following solutions have been done in LSF Version Service Pack 2 between 30 th May 2014 and 31 st January 2015: Date

26 When using badmin ckconfig, LSF will check the host information from NIS or DNS. If the network is not stable and responds slowly, this process will take a long time, which causes mbatchd to stop responding. The following is the new parameter description: Parameter name IGNORE_HOSTNAME_CHECK in lsb.params Syntax IGNORE_HOSTNAME_CHECK=Y yes N no If this parameter is enabled, LSF will ignore the checking for host information in NIS or DNS. Default N Date This fix allows LSF users or administrators to use wildcard characters in LSB_JOB_TMPDIR, JOB_SPOOL_DIR, job CWD and job output directories, including the following characters: - LSB_JOB_TMPDIR: %H - JOB_SPOOL_DIR: %H %P, %U, %C, and %JG - Job CWD and output directories: %H For more details on how to use these wild-card characters with LSF working on GPFS, refer to IBM Platform LSF Best Practices and Tips. sbatchd bparams Date For Red Hat Enterprise Linux (RHEL) version 6.6 Beta and later, there is a MemAvailable area in /proc/meminfo. If there is MemAvailable, read this value directly from /proc/meminfo for the available memory load indicator instead of calculating the value. Component lim Date

27 Support -R option for brestart command to let end users change resource requirement of a restarted job. The syntax of -R option of brestart command is the same as -R option of bsub and bmod commands. Component brestart Date Dumping the content of the job buckets to a file is to address the following issues: The smaller number of job buckets in the system might shorten the scheduling cycle. The total number of job buckets can be shown in the "badmin perfmon view" output. However, there was no easy way to see the job buckets themselves. There is no easy way to track down the cause of the large number of job buckets to help diagnose the problem. Component To generate the dump file containing all the current job buckets in the system, run badmin diagnose -c jobreq. The file contains the job buckets in XML format by default. The default file name "jobreq_<host_name>_<date_and_time>.xml" is used if "-f logfile_name" is not specified. The file location is DIAGNOSE_LOGDIR if configured in lsb.params. Otherwise, the file is in LSF_LOGDIR. bapp badmin bhist bjobs bparams bqueues sbatchd mbatchd mbschd schmod_default.so schmod_parallel.so schmod_fairshare.so schmod_affinity.so schmod_advrsv.so schmod_dc.so Date Add support to perform logic after job is submitted by bsub or after job is modified by bmod. Similarly to how esub scripts are run before job-submission or jobmodification, espub scripts are run after the operation. Component bsub bmod brestart mesub Date

28 When the LSF_NIOS_PEND_TIMEOUT environment variable is set, interactive jobs cannot be executed after the LSF_NIOS_PEND_TIMEOUT value expires. The job is killed and returns a message such as "Job <xxx> is being terminated". You can use the LSF_NIOS_DIE_CMD environment variable to specify a customized command and output message when the LSF_NIOS_PEND_TIMEOUT value expires. See the following example: user@host1: setenv LSF_NIOS_PEND_TIMEOUT 1 user@host1: setenv LSF_NIOS_DIE_CMD "bkill %J > /dev/null; echo job %J is terminated by bkill;" user@host1: echo $LSF_NIOS_DIE_CMD bkill %J > /dev/null; echo job %J is terminated by bkill; user@host1: bsub -I "echo test" Job <16> is submitted to default queue <normal>. <<Waiting for dispatch...>> job 16 is terminated by bkill About the LSF_NIOS_DIE_CMD environment variable: 1.The default value is "bkill jobid" 2.LSF_NIOS_DIE_CMD supports the %J variable, so you can use the job ID when you specify the custom command for LSF_NIOS_DIE_CMD. Component bsub Date

29 Improving job chunking usability is to addresse the following issues: A job's running time is not always predicable at the time of its submission. If such jobs are chunked but actually run for a very long time, other jobs in the same chunk are blocked in the chunk and wait for the long running job to finish. There is no way to reschedule these waiting jobs even if there are enough free resources. Traditional LSF job chunking will always chunk jobs together regardless of whether those jobs can run without being chunked. In some scenarios this will impact the resource utilization. Configuration In lsb.queues or lsb.applications, configure the new parameter CHUNK_MAX_WAIT_TIME together with CHUNK_JOB_SIZE on some queues or application profiles. Syntax: CHUNK_MAX_WAIT_TIME = <seconds> If a job is in WAIT status for longer than the configured time period, LSF removes the job from the job chunk and reschedules the job. The LSF scheduler ensures that such jobs are run instead of being chunked as a waiting member again when there are eligible resources. The application profile settings override queue-level configuration. Note: After a chunk job's waiting time exceeds CHUNK_MAX_WAIT_TIME, it may continue in WAIT status for one or more SBD_SLEEP_TIME cycles before being rescheduled. This is because sbatchd checks the timeout periodically, and the checking might be delayed if sbatchd is busy handling requests from mbatchd. In lsb.params, configure the new parameter ADAPTIVE_CHUNKING=Y to enable this feature. Component Note: This feature is not supported in the backfill and preemption phase in LSF bapp badmin bhist bjobs bparams bqueues sbatchd mbatchd mbschd schmod_default.so schmod_parallel.so schmod_fairshare.so schmod_affinity.so schmod_advrsv.so schmod_dc.so Date

30 Add support to expand the allremote keyword that appears in the HOST column of the bmgroup output. By expanding allremote, bmgroup displays leased-in hosts from other clusters instead of the allremote keyword. To enable the feature: Define LSB_BMGROUP_ALLREMOTE_EXPAND=Y in the appropriate configuration file to expand the "allremote" keyword in the bmgroup output to display leased-in hosts. To enable "allremote" to be expanded for all users, edit lsf.conf and define LSB_BMGROUP_ALLREMOTE_EXPAND=Y. To only enable "allremote" to be expanded for a specific user, specify LSB_BMGROUP_ALLREMOTE_EXPAND=Y as an environment variable in the user's local environment before issuing the command. bmgroup Date

31 Add support to show the settings for pending time, interactive jobs, exclusive jobs, and run time limit by either running bjobs -o pend_time, bjobs -o interactive, bjobs -o exclusive, bjobs -o runtimelimit/rtlimit or by adding pend_time, interactive, exclusive, runtimelimit/rtlimit to LSB_BJOBS_FORMAT in lsf.conf. For example: bjobs -o jobid pend_time interactive exclusive runtimelimit JOBID PEND_TIME INTERACTIVE EXCLUSIVE RUNTIMELIMIT 1 20 Y N 100.0/host N Y - 1. For a pending job, the PEND_TIME is the current time minus the job s submission time. 2. For a dispatched (running and suspending) job, the PEND_TIME is the job s start time minus the job s submission time. 3. For a requeued, migrated, or rerun job, the PEND_TIME is the current time (redispatched time) minus the job s requeued, migrated, or rerun time. 4. Jobs that are submitted with following bsub options are treated as interactive jobs. -I, -Ip, -Is, -IS, -ISp, -ISs, -IX. 5. bjobs -o exclusive shows Y for jobs that are submitted with -x option, a compute unit exclusive request, or an affinity exclusive request. 6. The RUNTIMELIMIT is the merged value of job level run time limit assignment, the application level run time limit setting and the queue level run time limit setting. If ABS_RUNLIMIT is enabled, the RUNTIMELIMIT is not normalized by the host CPU factor. 7. For IBM Platform LSF MultiCluster ("MultiCluster") with the job level run time limit specified, "bjobs -o runtimelimit" shows the normalized run time on both the submission cluster and the execution cluster. Defining the run time limit at the application or queue level in the submission cluster does not affect the job s run time on the job execution cluster, so defining it in the submission cluster is meaningless. However, when defining the run time limit at the application or queue level in the submission cluster, running "bjobs -o runtimelimit" in the submission cluster still shows the combined run time limit of the submission cluster as being different from the effective run time limit at the execution cluster, while running "bjobs -o runtimelimit" in the execution cluster shows the effective run time limit. bjobs

32 26503 Date This enhancement allows the system to kill the job using the most CPU if the average logic CPU r15m value and the UT value both reach a configured threshold on the host. This allows other jobs on the host to run smoothly. A job is considered the worst CPU offending job on a host if it is using the most CPU (system time + user time) for an average assigned slot during the check period. When one job is killed as worst CPU offending job, the exit reason is same as job's normal CPU limit reached: "job killed after reaching LSF CPU usage limit" Here Smart CPU Usgae Enforcement is considered as one special case of normal CPU limit function, just in the host level. This solution is configured through a new configuration parameter in lsf.conf: LSB_CPU_USAGE_ENF_CONTROL=<Average Logic CPU r15m Threshold>:<UT Threshold>:<Check Interval> : 1) Average Logic CPU r15m Threshold: A threshold value for the maximum limit for the quotient of host lsload command' r15m value and the count of host logic CPU. This means the average CPU queue length during the last 15 minutes for one logic CPU on the host. It must be a floating-point number, equal to or bigger than zero (0). For example, 7.8, 2.1, 0.9, and so on. 2) UT Threshold: A threshold for the maximum limit of the host lsload command's UT value. The UT value is the CPU utilization exponentially averaged over the last minute, between 0 and 1. It must be a floating-point number between 0 and 1. For example, 0.4, 0.5, 0.24, and so on. 3) Check Interval: The smallest period of time during which the host's r15m and UT information will not be checked between two close checking cycles. This value must be not less than the value of SBD_SLEEP_TIME and the unit is in seconds. For example, 20, 40, 60, and so on. 4) The host is considered to be in CPU overload when <Average Logic CPU r15m Threshold> and <UT Threshold> have both been reached. 5) This parameter does not affect jobs running across multiple hosts. Default: Not defined

33 Component sbatchd Date Component LSF supports global fairshare scheduling policy. LSF s global fairshare policy divides the processing power of Platform MultiCluster (MultiCluster) and the LSF/XL feature of Platform LSF Advanced Edition among users to provide fair access to all resources, so that every user can use the resources of multiple clusters according to their configured shares. Global fairshare is supported in Platform LSF Standard Edition and Platform LSF Advanced Edition. mbatchd sbatchd mbschd gpolicyd badmin bgpinfo bqueues schmod_advrsv.so schmod_affinity.so schmod_aps.so schmod_bluegene.so schmod_cpuset.so schmod_craylinux.so schmod_crayx1.so schmod_dc.so schmod_default.so schmod_dist.so schmod_fairshare.so schmod_fcfs.so schmod_jobweight.so schmod_limit.so schmod_mc.so schmod_parallel.so schmod_preemption.so schmod_pset.so schmod_ps.so schmod_reserve.so schmod_rms.so schmod_xl.so libbat.a libbat.so liblsf.a liblsf.so lsbatch.h Copyright and trademark information Copyright IBM Corporation 2015 U.S. Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp. IBM, the IBM logo and ibm.com are trademarks of International Business Machines Corp., registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the Web at "Copyright and trademark information" at

Fixed Bugs for IBM Platform LSF Version Fix Pack 3

Fixed Bugs for IBM Platform LSF Version Fix Pack 3 Fixed Bugs for IBM Platform LSF Version 9.1.3 Fix Pack 3 The following bugs have been fixed in LSF Version 9.1.3 Fix Pack 3 between 30 th May 2014 and 8 th June 2015: 235889 P100478 Date 2014-06-04 When

More information

Fixed Bugs for IBM Platform LSF Version 9.1.3

Fixed Bugs for IBM Platform LSF Version 9.1.3 Fixed Bugs for IBM LSF Version 9.1.3 Release Date: July 31 2014 The following bugs have been fixed in LSF Version 9.1.3 between 8 October 2013 and 21 July 2014: 223287 Date 2013-12-06 The preemption calculation

More information

Fixed Bugs for IBM Platform LSF Version

Fixed Bugs for IBM Platform LSF Version Fixed Bugs for IBM LSF Version 9.1.1.1 Release Date: July 2013 The following bugs have been fixed in LSF Version 9.1.1.1 since March 2013 until June 24, 2013: 173446 Date 2013-01-11 The full pending reason

More information

Fixed Bugs for IBM Spectrum LSF Version 10.1 Fix Pack 1

Fixed Bugs for IBM Spectrum LSF Version 10.1 Fix Pack 1 Fixed Bugs for IBM Spectrum LSF Version 10.1 Fix Pack 1 The following bugs have been fixed in LSF Version 10.1 Fix Pack 1 between 22 July 2016 and 20 October 2016: P101978 Date 2016-10-20 IBM Spectrum

More information

Release Notes for Platform LSF Version 7 Update 2

Release Notes for Platform LSF Version 7 Update 2 Release Notes for Platform LSF Version 7 Update 2 Contents Upgrade and Compatibility Notes on page 2 Release date: November 2007 Last modified: February 20, 2008 Comments to: doc@platform.com Support:

More information

IBM Spectrum LSF Version 10 Release 1. Release Notes IBM

IBM Spectrum LSF Version 10 Release 1. Release Notes IBM IBM Spectrum LSF Version 10 Release 1 Release Notes IBM IBM Spectrum LSF Version 10 Release 1 Release Notes IBM Note Before using this information and the product it supports, read the information in

More information

IBM Spectrum LSF Version 10 Release 1. Release Notes IBM

IBM Spectrum LSF Version 10 Release 1. Release Notes IBM IBM Spectrum LSF Version 10 Release 1 Release Notes IBM IBM Spectrum LSF Version 10 Release 1 Release Notes IBM Note Before using this information and the product it supports, read the information in

More information

Platform LSF Version 9 Release 1.2. Quick Reference GC

Platform LSF Version 9 Release 1.2. Quick Reference GC Platform LSF Version 9 Release 1.2 Quick Reference GC27-5309-02 Platform LSF Version 9 Release 1.2 Quick Reference GC27-5309-02 Note Before using this information and the product it supports, read the

More information

Platform LSF Version 9 Release 1.1. Foundations SC

Platform LSF Version 9 Release 1.1. Foundations SC Platform LSF Version 9 Release 1.1 Foundations SC27-5304-01 Platform LSF Version 9 Release 1.1 Foundations SC27-5304-01 Note Before using this information and the product it supports, read the information

More information

Platform LSF Version 9 Release 1.3. Foundations SC

Platform LSF Version 9 Release 1.3. Foundations SC Platform LSF Version 9 Release 1.3 Foundations SC27-5304-03 Platform LSF Version 9 Release 1.3 Foundations SC27-5304-03 Note Before using this information and the product it supports, read the information

More information

Platform LSF Security. Platform LSF Version 7.0 Update 5 Release date: March 2009 Last modified: March 16, 2009

Platform LSF Security. Platform LSF Version 7.0 Update 5 Release date: March 2009 Last modified: March 16, 2009 Platform LSF Security Platform LSF Version 7.0 Update 5 Release date: March 2009 Last modified: March 16, 2009 Copyright 1994-2009 Platform Computing Inc. Although the information in this document has

More information

Running Jobs with Platform LSF. Version 6.0 November 2003 Comments to:

Running Jobs with Platform LSF. Version 6.0 November 2003 Comments to: Running Jobs with Platform LSF Version 6.0 November 2003 Comments to: doc@platform.com Copyright We d like to hear from you Document redistribution policy Internal redistribution Trademarks 1994-2003 Platform

More information

Improved Infrastructure Accessibility and Control with LSF for LS-DYNA

Improved Infrastructure Accessibility and Control with LSF for LS-DYNA 4 th European LS-DYNA Users Conference LS-DYNA Environment I Improved Infrastructure Accessibility and Control with LSF for LS-DYNA Author: Bernhard Schott Christof Westhues Platform Computing GmbH, Ratingen,

More information

Using Platform LSF Advanced Edition

Using Platform LSF Advanced Edition Platform LSF Version 9 Release 1.3 Using Platform LSF Advanced Edition SC27-5321-03 Platform LSF Version 9 Release 1.3 Using Platform LSF Advanced Edition SC27-5321-03 Note Before using this information

More information

Installation Instructions for Platform Suite for SAS Version 9.1 for Windows

Installation Instructions for Platform Suite for SAS Version 9.1 for Windows Installation Instructions for Platform Suite for SAS Version 9.1 for Windows The correct bibliographic citation for this manual is as follows: SAS Institute Inc. 2015. Installation Instructions for Platform

More information

Installation Instructions for Platform Suite for SAS Version 10.1 for Windows

Installation Instructions for Platform Suite for SAS Version 10.1 for Windows Installation Instructions for Platform Suite for SAS Version 10.1 for Windows The correct bibliographic citation for this manual is as follows: SAS Institute Inc. 2016. Installation Instructions for Platform

More information

IBM Spectrum LSF Version 10 Release 1.0. Release Notes for IBM Spectrum LSF License Scheduler IBM

IBM Spectrum LSF Version 10 Release 1.0. Release Notes for IBM Spectrum LSF License Scheduler IBM IBM Spectrum LSF Version 10 Release 1.0 Release Notes for IBM Spectrum LSF License Scheduler IBM IBM Spectrum LSF Version 10 Release 1.0 Release Notes for IBM Spectrum LSF License Scheduler IBM Note Before

More information

Platform LSF Version 9 Release 1.2. Security SC

Platform LSF Version 9 Release 1.2. Security SC Platform LSF Version 9 Release 1.2 Security SC27-5303-02 Platform LSF Version 9 Release 1.2 Security SC27-5303-02 Note Before using this information and the product it supports, read the information in

More information

Troubleshooting your SAS Grid Environment Jason Hawkins, Amadeus Software, UK

Troubleshooting your SAS Grid Environment Jason Hawkins, Amadeus Software, UK ABSTRACT A SAS Grid environment provides a highly available and resilient environment for your business. The challenge is that the more complex these environments become, the harder it can be to troubleshoot

More information

Upgrading Platform LSF on UNIX and Linux

Upgrading Platform LSF on UNIX and Linux Upgrading Platform LSF on UNIX and Linux Contents Upgrade your LSF Cluster on page 2 Compatibility Notes on page 4 Get Technical Support on page 15 Version 6.2 February 2 2006 Platform Computing Comments

More information

Using Platform LSF with FLUENT

Using Platform LSF with FLUENT Using Platform LSF with FLUENT November 2003 Platform Computing Comments to: doc@platform.com Platform LSF software ( LSF ) is integrated with products from Fluent Inc., allowing FLUENT jobs to take advantage

More information

Platform LSF Version 9 Release 1.2. Running Jobs SC

Platform LSF Version 9 Release 1.2. Running Jobs SC Platform LSF Version 9 Release 1.2 Running Jobs SC27-5307-02 Platform LSF Version 9 Release 1.2 Running Jobs SC27-5307-02 Note Before using this information and the product it supports, read the information

More information

Release Notes for IBM Platform License Scheduler

Release Notes for IBM Platform License Scheduler Platform LSF Version 9 Release 1.2 Release Notes for IBM Platform License Scheduler GI13-3414-01 Platform LSF Version 9 Release 1.2 Release Notes for IBM Platform License Scheduler GI13-3414-01 Note Before

More information

Upgrading Platform LSF on UNIX

Upgrading Platform LSF on UNIX Upgrading Platform LSF on UNIX October 3 2002 Version 5.0 Platform Computing Comments to: doc@platform.com Contents Which Upgrade Steps to Use Upgrading an LSF Version 4.2 Cluster Installed with lsfinstall

More information

Using Docker in High Performance Computing in OpenPOWER Environment

Using Docker in High Performance Computing in OpenPOWER Environment Using Docker in High Performance Computing in OpenPOWER Environment Zhaohui Ding, Senior Product Architect Sam Sanjabi, Advisory Software Engineer IBM Platform Computing #OpenPOWERSummit Join the conversation

More information

LSF Reference Guide. Version June Platform Computing Corporation

LSF Reference Guide. Version June Platform Computing Corporation LSF Reference Guide Version 4.0.1 June 2000 Platform Computing Corporation Copyright First Edition June 2000 Copyright 1994-2000 Platform Computing Corporation All rights reserved. Printed in Canada Although

More information

HPC Middle East. KFUPM HPC Workshop April Mohamed Mekias HPC Solutions Consultant. A Quick Tour of IBM Platform LSF

HPC Middle East. KFUPM HPC Workshop April Mohamed Mekias HPC Solutions Consultant. A Quick Tour of IBM Platform LSF KFUPM HPC Workshop April 20-30 2015 Mohamed Mekias HPC Solutions Consultant A Quick Tour of IBM Platform LSF 1 Quick introduction to LSF for end users IBM Platform LSF (load sharing facility) is a suite

More information

LSF at SLAC. Using the SIMES Batch Cluster. Neal Adams. Stanford Linear Accelerator Center

LSF at SLAC. Using the SIMES Batch Cluster. Neal Adams. Stanford Linear Accelerator Center LSF at SLAC Using the SIMES Batch Cluster Neal Adams Stanford Linear Accelerator Center neal@slac.stanford.edu Useful LSF Commands bsub submit a batch job to LSF bjobs display batch job information bkill

More information

IBM Platform LSF. Best Practices. IBM Platform LSF and IBM GPFS in Large Clusters. Jin Ma Platform LSF Developer IBM Canada

IBM Platform LSF. Best Practices. IBM Platform LSF and IBM GPFS in Large Clusters. Jin Ma Platform LSF Developer IBM Canada IBM Platform LSF Best Practices IBM Platform LSF 9.1.3 and IBM GPFS in Large Clusters Jin Ma Platform LSF Developer IBM Canada Table of Contents IBM Platform LSF 9.1.3 and IBM GPFS in Large Clusters...

More information

Release Notes for Platform Process Manager. Platform Process Manager Version 8.1 January 2011 Last modified: January 2011

Release Notes for Platform Process Manager. Platform Process Manager Version 8.1 January 2011 Last modified: January 2011 Release Notes for Platform Process Manager Platform Process Manager Version 8.1 January 2011 Last modified: January 2011 Copyright 1994-2011 Platform Computing Corporation. Although the information in

More information

Laohu cluster user manual. Li Changhua National Astronomical Observatory, Chinese Academy of Sciences 2011/12/26

Laohu cluster user manual. Li Changhua National Astronomical Observatory, Chinese Academy of Sciences 2011/12/26 Laohu cluster user manual Li Changhua National Astronomical Observatory, Chinese Academy of Sciences 2011/12/26 About laohu cluster Laohu cluster has 85 hosts, each host has 8 CPUs and 2 GPUs. GPU is Nvidia

More information

Management of batch at CERN

Management of batch at CERN Management of batch at CERN What is this talk about? LSF as a product basic commands user perspective basic commands admin perspective CERN installation Unix users/groups and LSF groups share management

More information

Platform LSF Version 9 Release 1.1. Release Notes GI

Platform LSF Version 9 Release 1.1. Release Notes GI Platform LSF Version 9 Release 1.1 Release Notes GI13-3413-01 Platform LSF Version 9 Release 1.1 Release Notes GI13-3413-01 Note Before using this information and the product it supports, read the information

More information

Best practices. Using Affinity Scheduling in IBM Platform LSF. IBM Platform LSF

Best practices. Using Affinity Scheduling in IBM Platform LSF. IBM Platform LSF IBM Platform LSF Best practices Using Affinity Scheduling in IBM Platform LSF Rong Song Shen Software Developer: LSF Systems & Technology Group Sam Sanjabi Senior Software Developer Systems & Technology

More information

Platform LSF Desktop Support User s Guide

Platform LSF Desktop Support User s Guide Platform LSF Desktop Support User s Guide Version 7.0 Update 2 Release date: November 2007 Last modified: December 4 2007 Support: support@platform.com Comments to: doc@platform.com Copyright We d like

More information

Platform LSF Version 9 Release 1.1. Migrating on Windows SC

Platform LSF Version 9 Release 1.1. Migrating on Windows SC Platform LSF Version 9 Release 1.1 Migrating on Windows SC27-5317-00 Platform LSF Version 9 Release 1.1 Migrating on Windows SC27-5317-00 Note Before using this information and the product it supports,

More information

Using Platform LSF HPC

Using Platform LSF HPC Using Platform LSF HPC Version 7 Update 5 Release date: March 2009 Last modified: March 13, 2009 Support: support@platform.com Comments to: doc@platform.com Copyright We d like to hear from you 1994-2009,

More information

Release Notes for Platform LSF. Platform LSF Version 7.0 Update 6 Release date: September 2009 Last modified: September 1, 2009

Release Notes for Platform LSF. Platform LSF Version 7.0 Update 6 Release date: September 2009 Last modified: September 1, 2009 Platform LSF Version 7.0 Update 6 Release date: September 2009 Last modified: September 1, 2009 Contents Release Notes for Platform LSF... 3 Upgrade and Compatibility Notes... 3 What s Changed in Platform

More information

Using Platform LSF HPC Features

Using Platform LSF HPC Features Using Platform LSF HPC Features Version 8 Release date: January 2011 Last modified: January 10, 2011 Support: support@platform.com Comments to: doc@platform.com Copyright We d like to hear from you 1994-2011,

More information

Platform LSF Desktop Support Administrator s Guide

Platform LSF Desktop Support Administrator s Guide Platform LSF Desktop Support Administrator s Guide Version 7 Update 2 Release date: November 2007 Last modified: December 4 2007 Support: support@platform.com Comments to: doc@platform.com Copyright We

More information

Platform Analytics Version for LSF. Administering SC

Platform Analytics Version for LSF. Administering SC Platform Analytics Version 9.1.2 for LSF Administering SC14-7572-01 Platform Analytics Version 9.1.2 for LSF Administering SC14-7572-01 Note Before using this information and the product it supports,

More information

Release Notes for Platform Process Manager. Platform Process Manager Version 8.2 May 2012

Release Notes for Platform Process Manager. Platform Process Manager Version 8.2 May 2012 Release Notes for Platform Process Manager Platform Process Manager Version 8.2 May 2012 Copyright 1994-2012 Platform Computing Corporation. Although the information in this document has been carefully

More information

Platform LSF Version 9 Release 1.3. Migrating on Windows SC

Platform LSF Version 9 Release 1.3. Migrating on Windows SC Platform LSF Version 9 Release 1.3 Migrating on Windows SC27-5317-03 Platform LSF Version 9 Release 1.3 Migrating on Windows SC27-5317-03 Note Before using this information and the product it supports,

More information

PROCESS CONTROL BLOCK TWO-STATE MODEL (CONT D)

PROCESS CONTROL BLOCK TWO-STATE MODEL (CONT D) MANAGEMENT OF APPLICATION EXECUTION PROCESS CONTROL BLOCK Resources (processor, I/O devices, etc.) are made available to multiple applications The processor in particular is switched among multiple applications

More information

System Programming. Signals I

System Programming. Signals I Content : by Dr. B. Boufama School of Computer Science University of Windsor Instructor: Dr. A. Habed adlane@cs.uwindsor.ca http://cs.uwindsor.ca/ adlane/60-256 Content Content 1 Introduction 2 3 Signals

More information

Platform LSF concepts and terminology

Platform LSF concepts and terminology Platform LSF concepts and terminology Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 7.0 Unit objectives After completing this unit, you should

More information

Using Platform LSF on Windows. Version 6.0 February 2004 Comments to:

Using Platform LSF on Windows. Version 6.0 February 2004 Comments to: Using Platform LSF on Windows Version 6.0 February 2004 Comments to: doc@platform.com Copyright We d like to hear from you Document redistribution policy Internal redistribution Trademarks 1994-2004 Platform

More information

IBM Spectrum LSF Version 10 Release 1. Readme IBM

IBM Spectrum LSF Version 10 Release 1. Readme IBM IBM Spectrum LSF Version 10 Release 1 Readme IBM IBM Spectrum LSF Version 10 Release 1 Readme IBM Note Before using this information and the product it supports, read the information in Notices on page

More information

Using Platform LSF MultiCluster. Version 6.1 November 2004 Comments to:

Using Platform LSF MultiCluster. Version 6.1 November 2004 Comments to: Using Platform LSF MultiCluster Version 6.1 November 2004 Comments to: doc@platform.com Copyright We d like to hear from you Document redistribution policy Internal redistribution Trademarks 1994-2004

More information

Client Installation and User's Guide

Client Installation and User's Guide IBM Tivoli Storage Manager FastBack for Workstations Version 7.1 Client Installation and User's Guide SC27-2809-03 IBM Tivoli Storage Manager FastBack for Workstations Version 7.1 Client Installation

More information

IBM Spectrum LSF Process Manager Version 10 Release 1. Release Notes IBM GI

IBM Spectrum LSF Process Manager Version 10 Release 1. Release Notes IBM GI IBM Spectrum LSF Process Manager Version 10 Release 1 Release Notes IBM GI13-1891-04 IBM Spectrum LSF Process Manager Version 10 Release 1 Release Notes IBM GI13-1891-04 Note Before using this information

More information

Installing on Windows

Installing on Windows Platform LSF Version 9 Release 1.3 Installing on Windows SC27-5316-03 Platform LSF Version 9 Release 1.3 Installing on Windows SC27-5316-03 Note Before using this information and the product it supports,

More information

SMD149 - Operating Systems

SMD149 - Operating Systems SMD149 - Operating Systems Roland Parviainen November 3, 2005 1 / 45 Outline Overview 2 / 45 Process (tasks) are necessary for concurrency Instance of a program in execution Next invocation of the program

More information

IBM Platform LSF 9.1.3

IBM Platform LSF 9.1.3 IBM Platform LSF 9.1.3 Bill.McMillan@uk.ibm.com Global Product Portfolio Manager, IBM Platform LSF Family 1 IBM Platform LSF Family Key Drivers Unceasing demand for Compute Scalability and Throughput Node

More information

Univa Grid Engine Troubleshooting Quick Reference

Univa Grid Engine Troubleshooting Quick Reference Univa Corporation Grid Engine Documentation Univa Grid Engine Troubleshooting Quick Reference Author: Univa Engineering Version: 8.4.4 October 31, 2016 Copyright 2012 2016 Univa Corporation. All rights

More information

High-availability services in enterprise environment with SAS Grid Manager

High-availability services in enterprise environment with SAS Grid Manager ABSTRACT Paper 1726-2018 High-availability services in enterprise environment with SAS Grid Manager Andrey Turlov, Allianz Technology SE; Nikolaus Hartung, SAS Many organizations, nowadays, rely on services

More information

(MCQZ-CS604 Operating Systems)

(MCQZ-CS604 Operating Systems) command to resume the execution of a suspended job in the foreground fg (Page 68) bg jobs kill commands in Linux is used to copy file is cp (Page 30) mv mkdir The process id returned to the child process

More information

Contents. Error Message Descriptions... 7

Contents. Error Message Descriptions... 7 2 Contents Error Message Descriptions.................................. 7 3 4 About This Manual This Unify DataServer: Error Messages manual lists the errors that can be produced by the Unify DataServer

More information

Client Installation and User's Guide

Client Installation and User's Guide IBM Tivoli Storage Manager FastBack for Workstations Version 7.1.1 Client Installation and User's Guide SC27-2809-04 IBM Tivoli Storage Manager FastBack for Workstations Version 7.1.1 Client Installation

More information

Best practices. Deploying IBM Platform LSF on a Linux HPC Cluster. IBM Platform LSF

Best practices. Deploying IBM Platform LSF on a Linux HPC Cluster. IBM Platform LSF IBM Platform LSF Best practices Deploying IBM Platform LSF on a Linux HPC Cluster Jin Ma Software Developer: LSF Systems & Technology Group Chong Chen Principal Architect: LSF Product Family Systems &

More information

/6)%DWFK$GPLQLVWUDWRU V4XLFN 5HIHUHQFH

/6)%DWFK$GPLQLVWUDWRU V4XLFN 5HIHUHQFH /6)%DWFK$GPLQLVWUDWRU V4XLFN 5HIHUHQFH Version 3.2 3ODWIRUP&RPSXWLQJ&RUSRUDWLRQ /6)%DWFK$GPLQLVWUDWRU V4XLFN5HIHUHQFH Copyright 1994-1998 Platform Computing Corporation All rights reserved. This document

More information

SmartSuspend. Achieve 100% Cluster Utilization. Technical Overview

SmartSuspend. Achieve 100% Cluster Utilization. Technical Overview SmartSuspend Achieve 100% Cluster Utilization Technical Overview 2011 Jaryba, Inc. SmartSuspend TM Technical Overview 1 Table of Contents 1.0 SmartSuspend Overview 3 2.0 How SmartSuspend Works 3 3.0 Job

More information

ff5f5b56ce55bcf0cbe4daa5b412a72e SqlGuard-9.0p530_64-bit.tgz.enc

ff5f5b56ce55bcf0cbe4daa5b412a72e SqlGuard-9.0p530_64-bit.tgz.enc Problem Overview ================ Product: Guardium Release: 9.0/9.5 Fix ID#: Guardium v9.0 p530 r78220 Fix Completion Date: 2015-07-06 Description: Combined Fix Pack for v9.0 GPU 500 (Jun 29 2015) MD5Sums/

More information

Installation Instructions for Platform Suite for SAS Version 7.1 for UNIX

Installation Instructions for Platform Suite for SAS Version 7.1 for UNIX Installation Instructions for Platform Suite for SAS Version 7.1 for UNIX Copyright Notice The correct bibliographic citation for this manual is as follows: SAS Institute Inc., Installation Instructions

More information

IBM Tivoli Storage Manager HSM for Windows Version 7.1. Messages

IBM Tivoli Storage Manager HSM for Windows Version 7.1. Messages IBM Tivoli Storage Manager HSM for Windows Version 7.1 Messages IBM Tivoli Storage Manager HSM for Windows Version 7.1 Messages Note: Before using this information and the product it supports, read the

More information

Programs. Program: Set of commands stored in a file Stored on disk Starting a program creates a process static Process: Program loaded in RAM dynamic

Programs. Program: Set of commands stored in a file Stored on disk Starting a program creates a process static Process: Program loaded in RAM dynamic Programs Program: Set of commands stored in a file Stored on disk Starting a program creates a process static Process: Program loaded in RAM dynamic Types of Processes 1. User process: Process started

More information

Release Notes for Patches for the MapR Release

Release Notes for Patches for the MapR Release Release Notes for Patches for the MapR 5.0.0 Release Release Notes for the December 2016 Patch Released 12/09/2016 These release notes describe the fixes that are included in this patch. Packages Server

More information

IBM Spectrum LSF Version 10 Release 1.0. Using IBM Spectrum LSF License Scheduler IBM SCNN-NNNN-00

IBM Spectrum LSF Version 10 Release 1.0. Using IBM Spectrum LSF License Scheduler IBM SCNN-NNNN-00 IBM Spectrum LSF Version 10 Release 1.0 Using IBM Spectrum LSF License Scheduler IBM SCNN-NNNN-00 IBM Spectrum LSF Version 10 Release 1.0 Using IBM Spectrum LSF License Scheduler IBM SCNN-NNNN-00 Note

More information

IRIX Resource Management Plans & Status

IRIX Resource Management Plans & Status IRIX Resource Management Plans & Status Dan Higgins Engineering Manager, Resource Management Team, SGI E-mail: djh@sgi.com CUG Minneapolis, May 1999 Abstract This paper will detail what work has been done

More information

/6)3URJUDPPHUV*XLGH. Version 3.2 Fourth Edition, August ODWIRUP&RPSXWLQJ&RUSRUDWLRQ

/6)3URJUDPPHUV*XLGH. Version 3.2 Fourth Edition, August ODWIRUP&RPSXWLQJ&RUSRUDWLRQ /6)3URJUDPPHUV*XLGH Version 3.2 Fourth Edition, August 1998 3ODWIRUP&RPSXWLQJ&RUSRUDWLRQ /6)3URJUDPPHU V*XLGH Copyright 1994-1998 Platform Computing Corporation All rights reserved. This document is copyrighted.

More information

Enabling ARM Instrumentation for Platform LSF and Platform Process Manager for SAS. November 2006

Enabling ARM Instrumentation for Platform LSF and Platform Process Manager for SAS. November 2006 Enabling ARM Instrumentation for Platform LSF and Platform Process Manager for SAS November 2006 Copyright Document redistribution and translation Internal redistribution Trademarks Third-party license

More information

Ch 4 : CPU scheduling

Ch 4 : CPU scheduling Ch 4 : CPU scheduling It's the basis of multiprogramming operating systems. By switching the CPU among processes, the operating system can make the computer more productive In a single-processor system,

More information

Grid Computing in SAS 9.4

Grid Computing in SAS 9.4 Grid Computing in SAS 9.4 SAS Documentation The correct bibliographic citation for this manual is as follows: SAS Institute Inc. 2013. Grid Computing in SAS 9.4. Cary, NC: SAS Institute Inc. Grid Computing

More information

Source OID Message Severity Cause Action

Source OID Message Severity Cause Action 13 CHAPTER This section describes the Prime Network system events. System events appear in the Prime Network Events System tab. They include a variety of events pertaining to the system activities, from

More information

SUDO(8) System Manager s Manual SUDO(8)

SUDO(8) System Manager s Manual SUDO(8) NAME sudo, sudoedit - execute a command as another user SYNOPSIS sudo -h -K -k -V sudo -v [-AknS] [-a type] [-g group] [-h host] [-p prompt] [-u user] sudo -l [-AknS] [-a type] [-g group] [-h host] [-p

More information

Migrate Platform LSF to Version 7 on Windows. Platform LSF Version 7.0 Update 6 Release date: August 2009 Last modified: August 17, 2009

Migrate Platform LSF to Version 7 on Windows. Platform LSF Version 7.0 Update 6 Release date: August 2009 Last modified: August 17, 2009 Migrate Platform LSF to Version 7 on Windows Platform LSF Version 7.0 Update 6 Release date: August 2009 Last modified: August 17, 2009 Copyright 1994-2009 Platform Computing Inc. Although the information

More information

Process management. What s in a process? What is a process? The OS s process namespace. A process s address space (idealized)

Process management. What s in a process? What is a process? The OS s process namespace. A process s address space (idealized) Process management CSE 451: Operating Systems Spring 2012 Module 4 Processes Ed Lazowska lazowska@cs.washington.edu Allen Center 570 This module begins a series of topics on processes, threads, and synchronization

More information

Migrating on UNIX and Linux

Migrating on UNIX and Linux Platform LSF Version 9 Release 1.3 Migrating on UNIX and Linux SC27-5318-03 Platform LSF Version 9 Release 1.3 Migrating on UNIX and Linux SC27-5318-03 Note Before using this information and the product

More information

Using LSF with Condor Checkpointing

Using LSF with Condor Checkpointing Overview Using LSF with Condor Checkpointing This chapter discusses how obtain, install, and configure the files needed to use Condor checkpointing with LSF. Contents Introduction on page 3 Obtaining Files

More information

Error Message Reference

Error Message Reference Security Policy Manager Version 7.1 Error Message Reference GC23-9477-01 Security Policy Manager Version 7.1 Error Message Reference GC23-9477-01 Note Before using this information and the product it

More information

Introduction to NCAR HPC. 25 May 2017 Consulting Services Group Brian Vanderwende

Introduction to NCAR HPC. 25 May 2017 Consulting Services Group Brian Vanderwende Introduction to NCAR HPC 25 May 2017 Consulting Services Group Brian Vanderwende Topics we will cover Technical overview of our HPC systems The NCAR computing environment Accessing software on Cheyenne

More information

Installation Instructions for Platform Suite for SAS Version 4.1 for UNIX

Installation Instructions for Platform Suite for SAS Version 4.1 for UNIX Installation Instructions for Platform Suite for SAS Version 4.1 for UNIX Copyright Notice The correct bibliographic citation for this manual is as follows: SAS Institute Inc., Installation Instructions

More information

Running the model in production mode: using the queue.

Running the model in production mode: using the queue. Running the model in production mode: using the queue. 1) Codes are executed with run scripts. These are shell script text files that set up the individual runs and execute the code. The scripts will seem

More information

Virtuoso Analog Distributed Processing Option User Guide. Product Version September 2008

Virtuoso Analog Distributed Processing Option User Guide. Product Version September 2008 Virtuoso Analog Distributed Processing Option User Guide Product Version 6.1.3 September 2008 1999 2008 Cadence Design Systems, Inc. All rights reserved. Printed in the United States of America. Cadence

More information

Introduction to High-Performance Computing (HPC)

Introduction to High-Performance Computing (HPC) Introduction to High-Performance Computing (HPC) Computer components CPU : Central Processing Unit CPU cores : individual processing units within a Storage : Disk drives HDD : Hard Disk Drive SSD : Solid

More information

OPERATING SYSTEM. Chapter 9: Virtual Memory

OPERATING SYSTEM. Chapter 9: Virtual Memory OPERATING SYSTEM Chapter 9: Virtual Memory Chapter 9: Virtual Memory Background Demand Paging Copy-on-Write Page Replacement Allocation of Frames Thrashing Memory-Mapped Files Allocating Kernel Memory

More information

Mid Term from Feb-2005 to Nov 2012 CS604- Operating System

Mid Term from Feb-2005 to Nov 2012 CS604- Operating System Mid Term from Feb-2005 to Nov 2012 CS604- Operating System Latest Solved from Mid term Papers Resource Person Hina 1-The problem with priority scheduling algorithm is. Deadlock Starvation (Page# 84) Aging

More information

IBM DB2 Query Patroller. Administration Guide. Version 7 SC

IBM DB2 Query Patroller. Administration Guide. Version 7 SC IBM DB2 Query Patroller Administration Guide Version 7 SC09-2958-00 IBM DB2 Query Patroller Administration Guide Version 7 SC09-2958-00 Before using this information and the product it supports, be sure

More information

TORQUE Resource Manager5.0.2 release notes

TORQUE Resource Manager5.0.2 release notes TORQUE Resource Manager release notes The release notes file contains the following sections: New Features on page 1 Differences on page 2 Known Issues on page 4 Resolved issues on page 4 New Features

More information

General Objectives: To understand the process management in operating system. Specific Objectives: At the end of the unit you should be able to:

General Objectives: To understand the process management in operating system. Specific Objectives: At the end of the unit you should be able to: F2007/Unit5/1 UNIT 5 OBJECTIVES General Objectives: To understand the process management in operating system Specific Objectives: At the end of the unit you should be able to: define program, process and

More information

Upgrading Platform LSF on UNIX and Linux. Platform LSF Version 8.0 June 2011

Upgrading Platform LSF on UNIX and Linux. Platform LSF Version 8.0 June 2011 Upgrading Platform LSF on UNIX and Linux Platform LSF Version 8.0 June 2011 Copyright 1994-2011 Platform Computing Corporation. Although the information in this document has been carefully reviewed, Platform

More information

Cube Analyst Drive. Release Summary. Citilabs

Cube Analyst Drive. Release Summary. Citilabs Cube Analyst Drive Release Summary Cube Analyst Drive Release Summary Citilabs Cube Analyst Drive Release Summary This section documents changes included in each release of Cube Analyst Drive. You may

More information

Scheduling in SAS 9.4, Second Edition

Scheduling in SAS 9.4, Second Edition Scheduling in SAS 9.4, Second Edition SAS Documentation September 5, 2017 The correct bibliographic citation for this manual is as follows: SAS Institute Inc. 2016. Scheduling in SAS 9.4, Second Edition.

More information

Batches and Commands. Overview CHAPTER

Batches and Commands. Overview CHAPTER CHAPTER 4 This chapter provides an overview of batches and the commands contained in the batch. This chapter has the following sections: Overview, page 4-1 Batch Rules, page 4-2 Identifying a Batch, page

More information

CSE 451: Operating Systems Winter Module 4 Processes. Mark Zbikowski Allen Center 476

CSE 451: Operating Systems Winter Module 4 Processes. Mark Zbikowski Allen Center 476 CSE 451: Operating Systems Winter 2015 Module 4 Processes Mark Zbikowski mzbik@cs.washington.edu Allen Center 476 2013 Gribble, Lazowska, Levy, Zahorjan Process management This module begins a series of

More information

IBM. Systems management Disk management. IBM i 7.1

IBM. Systems management Disk management. IBM i 7.1 IBM IBM i Systems management Disk management 7.1 IBM IBM i Systems management Disk management 7.1 Note Before using this information and the product it supports, read the information in Notices, on page

More information

McAfee Enterprise Security Manager

McAfee Enterprise Security Manager Release Notes McAfee Enterprise Security Manager 10.0.2 Contents About this release New features Resolved issues Instructions for upgrading Find product documentation About this release This document contains

More information

LSF Make. Platform Computing Corporation

LSF Make. Platform Computing Corporation LSF Make Overview LSF Make is only supported on UNIX. LSF Batch is a prerequisite for LSF Make. The LSF Make product is sold, licensed, distributed, and installed separately. For more information, contact

More information

Cube Analyst Drive. Release Summary. Citilabs. This section documents changes included in each release of Cube Analyst Drive.

Cube Analyst Drive. Release Summary. Citilabs. This section documents changes included in each release of Cube Analyst Drive. Cube Analyst Drive Release Summary Cube Analyst Drive Release Summary Citilabs Cube Analyst Drive Release Summary This section documents changes included in each release of Cube Analyst Drive. You may

More information

RSA Authentication Manager Adapter User Guide

RSA Authentication Manager Adapter User Guide IBM Security Identity Manager Version 6.0 RSA Authentication Manager Adapter User Guide SC27-4409-04 IBM Security Identity Manager Version 6.0 RSA Authentication Manager Adapter User Guide SC27-4409-04

More information