Fixed Bugs for IBM Platform LSF Version Fix Pack 3

Size: px

Start display at page:

Download "Fixed Bugs for IBM Platform LSF Version Fix Pack 3"

Chastity Dawson
6 years ago
Views:

Fixed Bugs for IBM Platform LSF Version 9.1.

1 Fixed Bugs for IBM Platform LSF Version Fix Pack 3 The following bugs have been fixed in LSF Version Fix Pack 3 between 30 th May 2014 and 8 th June 2015: P Date When executing a job containing multiple tasks, the task RES calculates an incorrect XDR size and causes an XDR encoding error. Component res Jobs that are launched by blaunch fail to execute P Date In MultiCluster lease mode, a buffer overflow occurs when the lsb.lease.state file is too large, causing an mbatchd core dump. mbatchd core dumps and LSF is unrecoverable P Date In MultiCluster forward mode, a parallel job submitted with the same section as RES_REQ is blocked. Component mbschd schmod_mc.so Jobs remain pending indefinitely on the submission cluster.

2 P Date When the lsb_submit() API is used to submit jobs for both a parent and child process, jobs submitted for the child process do not process LSB_SUB_MODIFY_FILE and LSB_SUB_MODIFY_ENVFILE properly. Component liblsf.a liblsb.so libbat.a libbat.so lsbatch.h lsf.h esub does not work with the lsb_submit API P Date When checkpoint jobs are submitted by script, brestart -W does not work and restarted jobs cannot be terminated by RUNLIMIT. Component sbatchd erestart After running brestart, the job cannot exit and keeps showing "Checkpoint initiated" and "Checkpoint succeeded" in sequence P Date When mbschd encounters an error on a job, other jobs do not get scheduled and remain pending. Component mbschd Jobs remain pending until an administrator runs badmin reconfig P Date There are problems with guarantee and preemption when consumer jobs that are high priority and guaranteed remain pending even when resource requirements are met. Component mbschd schmod_default.so Some pending jobs cannot run even when resource requirements are met.

3 P Date When using bsub to submit a job, a core dump occurs if the specified command and its arguments contain multiple quotations. Component bsub Cannot submit jobs if the command and its arguments contain multiple quotations P Date If there are many bhosts requests and an affinity host in the cluster (or if affinity is enabled in the cluster), after enabling LSB_QUERY_ENH in lsf.conf, the query child mbatchd core dumps repeatedly. The core dump is caused by a "thread unsafe function". Child mbatchd core dumps, which causes b* query commands to fail P Date When a license error occurs ("Unable to contact LIM"), mbatchd exits. The error message is confusing P Date When preemption and guaranteed SLA are both enabled, mbschd will take a long time to finish one scheduling cycle, especially when there are tens of thousands of pending jobs. Component mbschd An mbschd performance issue causes low job throughput in the LSF cluster.

4 P Date When LSF_TMPDIR is set to a shared file system, esub sometimes does not work because the temp file for one job is overwritten by another job. Component bsub Wrong or missing job submission options set in esub P Date When shared resources are configured for the cluster, vemkd reports a warning message: "lsfinit: resource <resource_name> is being used by multiple hosts. It cannot be used in a resource requirement expression." Component vemkd Several error messages are logged in the vemkd log file even with the correct configuration P Date If sbatchd is not responding or is unavailable when mbatchd attempts to send modification information to it, sbatchd will never receive the modification information. sbatchd Using bmod to change a running job s run time limit does not take effect when sbatchd is unavailable P Date In a mixed cluster environment, using a bpeek command on a job that is running on another host occasionally fails. Component bpeek bpeek occasionally does not work.

5 P Date When using the command brestart with GOLD integration jobs, the command fails due to some missing job information such as a project name or job ID. Component brestart Jobs restarted with brestart does not work well with GOLD integration P Date When running a long pre-execution job, if the job is killed before finishing the pre-execution script, running eexec cannot get the environment variable LSF_JOB_EXECUSER. Component sbatchd Several GOLD reservations are not released if gcharge fails and the job is killed P Date Cannot backfill the reserved job when the job is an exclusive job. Component mbschd schmod_reserve.so Short jobs cannot use backfill slots Date When advance reservation files exist (lsb.rsv.id, lsb.rsv.stat), LSF should log messages at the ERROR level if mbatchd fails to open the files for reading. 2. When advance reservation files do not exist, LSF should log messages at the LOG_INFO level. 3. mbatchd should always use the LSF primary administrator account to access these files under LSB_SHAREDIR. Currently, when mbatchd starts up, mbatchd uses root to read advance reservation files. User does not know when the advance reservation file is not accessible.

6 P Date Performance enhancements for LSF scheduler. Each is enabled with a parameter in lsf.conf: 1. LSB_SHARED_RSRC_ENH=Y LSF allows you to configure multiple instances of a (site-defined) shared resource. For example, for shared resource "R", there may be one instance consisting of 10 units of R that is available on hosts 1 and 2, and a second instance consisting of 10 units of R that is available on hosts 3 and 4. Each host can be associated with up to one instance. If a job specifies a shared resource in its rusage string and LSF discovers that the job cannot use one host because of a lack of the resource, other hosts are also checked since there may be multiple instances of the resource. In the special case of a single resource instance for the cluster (for example, representing a floating software license), LSF would ideally not consider any other hosts for the job. When you set LSB_SHARED_RSRC_ENH=Y, after LSF finds that an insufficient amount of a single-instance shared resource is available on one host, LSF will not consider other hosts for the job. 2. LSB_SKIP_FULL_HOSTS=Y LSF removes unusable hosts from consideration at the beginning of each scheduling session. For example, hosts that are down (unavail or unreach), closed by the administrator (closed_adm), or closed due to a load threshold (closed_busy), are unusable by any job and can be removed from consideration. Removing these hosts early in the scheduling session improves performance. Hosts with all slots occupied (closed_full) are not removed, since these hosts can still be used by jobs in preemptive queues, if queue-based preemption is enabled. For sites without preemption configured, it is not necessary for LSF to consider full hosts. When you set LSB_SKIP_FULL_HOSTS=Y, LSF removes full hosts from consideration at the beginning of each scheduling session, as long as either the preemption plug-in is not loaded, or there is no preemption relationship between queues. For more details, see the PREEMPTION parameter in lsb.queues. 3. LSB_DISABLE_PROJECT_LIMITS=Y Internally, LSF puts jobs with like attributes into "buckets" for scheduling efficiency. The idea is that if LSF cannot dispatch one job in a job bucket during a scheduling session, LSF can generally assume that the rest of the jobs in the same bucket cannot be dispatched either, and therefore do not need to be considered. In general, fewer job buckets leads to better scheduling performance. Use "badmin perfmon view" to see the current number of job buckets in your cluster. By default, LSF separates jobs with different projects (given by the bsub P option) into different buckets. The reason for this is to handle project-based limits. For example, a limit on project P1 of 25 slots. If you do not configure project limits, you can set LSB_DISABLE_PROJECT_LIMITS=Y to prevent LSF from separating jobs into buckets based on project name. When this is enabled, LSF will ignore configured project limits. 4. LSB_FAST_REQUEST_NEW_JOBS=Y This parameter reduces the time taken in communicating newly submitted jobs from

7 mbatchd to mbschd. 5. LSB_SHARE_LOCATION_ENH=Y This parameter improves LSF performance by reducing the sizes of messages passed between mbatchd and mbschd. By default, messages between these daemons identify each instance of a shared resource by the list of hosts corresponding to that instance. After you set LSB_SHARE_LOCATION_ENH=Y, each instance is assigned an integer ID that is used for communication. This enhancement is especially useful for sites with several configured shared resources. mbschd Lower mbschd performance has a large impact on job dispatching rate in a cluster P Date This fix introduces five performance enhancements for the LSF scheduler. You must individually enable each of the following enhancements with a parameter in lsf.conf: 1. LSB_SHARED_RSRC_ENH=Y LSF allows you to configure multiple instances of a (site-defined) shared resource. For example, for a shared resource "R", there may be one instance consisting of 10 units of R that is available on hosts 1 and 2, and a second instance consisting of 10 units of R that is available on hosts 3 and 4. Each host can be associated with up to one instance. If a job specifies a shared resource in its rusage string and LSF discovers that the job cannot use one host because of a lack of the resource, other hosts are also checked since there may be multiple instances of the resource. In the special case of a single resource instance for the cluster (for example, representing a floating software license), LSF would ideally not consider any other hosts for the job. When you set LSB_SHARED_RSRC_ENH=Y, after LSF finds that an insufficient amount of a single-instance shared resource is available on one host, LSF will not consider other hosts for the job. 2. LSB_SKIP_FULL_HOSTS=Y LSF removes unusable hosts from consideration at the beginning of each scheduling session. For example, hosts that are down (unavail or unreach), closed by the administrator (closed_adm), or closed due to a load threshold (closed_busy), are unusable by any job and can be removed from consideration. Removing these hosts early in the scheduling session helps with performance. Hosts with all slots occupied (closed_full) are not removed, since these hosts can

8 Component still be used by jobs in preemptive queues, if queue-based preemption is enabled. For sites without preemption configured, it is not necessary for LSF to consider full hosts. When you set LSB_SKIP_FULL_HOSTS=Y, LSF removes full hosts from consideration at the beginning of each scheduling session, as long as either the preemption plug-in is not loaded, or there is no preemption relationship between queues (for more details, see the PREEMPTION parameter in lsb.queues). mbatchd mbschd schmod_default.so schmod_reserve.so schmod_preemption.so schmod_affinity.so schmod_parallel.so schmod_advrsv.so schmod_aps.so schmod_bluegene.so schmod_cpuset.so schmod_craylinux.so schmod_crayx1.so schmod_dc.so schmod_dist.so schmod_fairshare.so schmod_fcfs.so schmod_jobweight.so schmod_limit.so schmod_mc.so schmod_ps.so schmod_pset.so schmod_rms.so schmod_xl.so Lower mbschd performance has a large impact on job dispatching rate in a cluster P Date An issue with the LSF library can cause the NIOS process to enter a busy loop, resulting in issues with MPI job performance. This issue is triggered when the stdin of blaunch is neither /dev/null nor FIFO. When running fluent under Platform MPI, the stdin of blaunch is redirected to a socket, which triggers this issue. Component nios Configured guarantee cannot be held P Date Incorrect behavior when using bsub -I to submit a job containing "&&" in the command line (for example, bsub -I echo "test&&mg"). Component bsub Job command is changed by LSF if && is used in the command line.

9 P Date If more than 1000 resources are defined in lsf.shared, then lsadmin, and badmin will core dump. Component lsadmin badmin Cannot use lsadmin and badmin to start up LSF daemons when more than 1000 resources are defined in lsf.shared P Date A job with span[ptile=x] may cause an mbschd core dump. Component mbschd mbschd core dumps, causing a low job dispatch rate P Date When a parallel job finishes, LSF reports an incorrect MAXMEM value. Component res LSF reports a larger than actual memory usage, impacting the analysis of parallel jobs P Date When brequeue is used to re-queue a job to pending status, the run time value in lsb.stream is incorrect. Therefore, the job is killed before resuming. Requeued jobs that are killed while pending are logged in the Platform Analytics database with a long run time.

10 P Date bjobs -l does not show the same effective resource requirement string if the originally-specified resource requirement string is longer than 512 bytes. Component mbschd If a different string is shown for the effective resource requirement, a user may think the job is dispatched incorrectly P Date Guaranteed resources are not held when there is a host in the closed_busy state. Component mbschd schmod_default.so Configured guarantees cannot be held P Date After running lsrun on some hosts, the lsload and lsload -E commands display incorrect r15s and r1m values for that host. Component lim lsload, lsload E, and lsload -N commands display incorrect results on some hosts after running lsrun P Date Compute unit resource requirements do not work with leased-in hosts. 1. This fix allows for the specification of leased-in hosts in the definition of a compute unit. For example: Begin ComputeUnit NAME MEMBER TYPE en1 (host1@mc1) (enclosure) en2 (ho*@mc1) (enclosure) End ComputeUnit

11 Note: A valid name of a leased-in host must be defined for the MEMBER column. The badmin reconfig command does not log an error or warning message if you specify an invalid host name. mbatchd only logs the error or warning message in the mbatchd log file after mbatchd gets the leased-in host information from the remote cluster. 2. This fix allows dynamic hosts and leased-in hosts to join in a compute unit (after running badmin reconfig to apply the changes). This allows jobs with a compute unit resource requirement to be dispatched to the new dynamic hosts and leased-in hosts. mbschd Users cannot specify compute unit resource requirements for leased-in hosts P Date When there are no more records, lsb_readjobinfo() returns error code 53 instead of 47. Component liblsf.a liblsf.so libbat.a libbat.so lsf.h lsbatch.h The LSF API client code fails to detect the case when there are no more job records in mbatchd P Date If using both compute unit and affinity, some jobs may cause an mbschd core dump. Component mbschd schmod_parallel.so schmod_reserve.so mbschd core dumps, causing a low job dispatch rate P Date CPU binding does not work with LSF_BIND_JOB after LSF is started using the 'lsf_daemons start' command. Component sbatchd Jobs cannot be bound after running 'lsf_daemons start'.

12 P Date When running badmin reconfig, mbatchd fails to receive the system user name or system group name. Therefore, after badmin reconfig, jobs cannot be submitted when specifying a group name using -G due an gives an "Unknown user or user group" error. sbatchd New jobs specifying a group name cannot be submitted after running badmin reconfig P Date Dynamic hosts change to "closed_inactive" status if it is not a Platform MultiCluster host. Dynamic host cannot be used after it is changed to "closed_inactive" P Date The Performance Application Programming Interface (PAPI) conflicts with hardware counter data collection in LSF. When LSF_COLLECT_ENERGY_USAGE in lsf.conf is set to "Y", jobs submitted with energy aware scheduling options and using PAPI do not trigger hardware counters correctly. This fix introduces the esub environment variable LSB_SUB4_COLLECT_ENERGY_USAGE that allows LSF to collect energy-related usage data at the job level, narrowing down the cluster level energy usage data collection. Job does not trigger energy usage data collection.

13 P Date bjobs/bhist sometimes shows the wrong signal number when the job reaches the run limit. Component bjobs bhist End users do not know the actual job exit reason P Date When LSF kills a job that is part of a job dependency condition (that is, when LSF kills a job that other jobs depend on), mbatchd takes a long period of time to restart. mbatchd is busy evaluating job dependencies, causing LSF to stop working P Date When a dynamic host with exclusive resources joins the cluster, the exclusive resources disappear from the dynamic host after re-configuring the LIM. Component lim After reconfiguring the LIM, exclusive resources are lost from the dynamic host P Date Job finish time includes the post-execution processing time, which impacts Platform RTM statistics. RTM does not report the correct job finish time.

14 P Date When $LSF_ENVDIR is not set, elim.hpc does not check if /etc/lsf.conf exists. When $LSF_BINDIR is not set, elim.hpc has a security hole that can cause normal users to gain root permissions. Component elim.hpc elim.hpc does not work properly if $LSF_ENVDIR or $LSF_BINDIR are not set P Date mbatchd may fail when any line of lsb.users is longer than 4352 characters. mbatchd core dumps and LSF no longer works P Date When submitting jobs with memory requirements and specifying "span[hosts=1]", if there are no hosts in the cluster that can meet the memory requirement, LSF still makes a slot reservation for the job. Component mbschd schmod_default.so A high priority job reserves resources but cannot run on the host, causing a waste of resources P Date The unit for the mem, swp, and tmp thresholds that lshosts displays is not changed after defining a different unit using LSF_UNIT_FOR_LIMITS in lsf.conf. Component lshosts The lshosts output is incorrect.

15 P Date When configuring shared resources only on dynamic hosts, the master LIM core dumps if all the dynamic hosts are removed from the cluster. Component lim LIM core dumps and LSF no longer works P Date Job execution initialization fails if the execution host cannot resolve the submission host. This fix introduces the following parameter: LSB_DISABLE_SUB_HOST_LOOKUP Configured in lsf.conf Syntax LSB_DISABLE_SUB_HOST_LOOKUP=Y N Disable submission host name lookup when executing jobs. When this parameter is set, the job execution sbatchd does not look up the submission host name when executing or cleaning up the job. LSF will not be able to do any host-dependent automounting. Default Set to N. LSF will look up the submission host name when executing jobs. Component sbatchd badmin The job cannot run on some hosts.

16 P Date Due to a transient name resolution causing mbatchd/sbatchd communication issues, finished jobs are reported as running and cannot be killed with bkill. When the compute host s host name is incorrect, the master host cannot receive the job status. Therefore, the system keeps the job in a run status and it cannot be killed. This fix assumes that the host names configured for the LSF cluster are the same as the official names configured in the DNS server or /etc/hosts. Both host names (LSF cluster and DNS server) may include the domain (or not), but they must match. mbatchd keeps jobs in running status and end users cannot use bkill to kill their jobs P Date If a blaunch job is submitted using 'bsub -i' with a large input file (larger than 8192 bytes), the job hangs. Component res LSF jobs hang when a very large input file is specified P Date Improvement for job scheduling performance when there are several single-host parallel job buckets. Component mbschd schmod_parallel.so Lower mbschd performance has a large impact on job dispatch rate in a cluster.

17 P Date When the status of hosts becomes UNAVAIL and if there is advanced reservation defined in the hosts, warning messages do not clarify whether there are more slots reserved than are available on the host, or if the problem is that the status became UNAVAIL. The warning message is misleading and does not tell users how to avoid it P Date When resizing the terminal window of an interactive blaunch job, the job exits with the SIGPROF signal. Component res Parallel jobs are unexpectedly killed by LSF P Date If a user group is updated with egroup using EGROUP_UPDATE_INTERVAL=1 as defined in lsb.params, and a new user is added to the user group, that user's MAX_JOBS value does not display the correct value. Component Mbatchd The wrong MAX_JOBS value applies to LSF users.

18 P Date Add a parameter to lsf.conf to control the memory usage report when using cgroup. Syntax LSB_CGROUP_MEM_INCLUDE_CACHE=Y N y n When set to "Y/y", LSF includes rss and cache in the memory usage report when cgroup is enabled. When set to "N/n", LSF only includes rss in the memory usage report when cgroup is enabled. Default Set to Y Component sbatchd res LSF reports that a job is using more memory than the actual use, causing the job to be unexpectedly killed P Date It takes more than 10 minutes to kill an array job with approximately 1000 elements. Array jobs cannot be killed in a short time, causing slots to go unused P Date bhosts and busers may report an incorrect reserved slots value when time-based slot reservation is enabled. Component mbschd schmod_reserve.so Command output is wrong which may confuse end users.

19 P Date A file descriptor limit that is set higher than is not respected by sbatchd and RES. Component sbatchd Jobs depending on a large number of open files fail to run P Date If a Platform MPI job is terminated because a task on the first node ran over the memory limit and is killed by cgroup memory fencing, the bjobs, bhist, and bacct commands cannot display the job exit reason and finished resource usage properly. Component sbatchd Job accounting information is incorrect P Date When submitting jobs with memory requirements ("rusage[mem=value]") and processor requirements ("span[ptile=value]"), there is an incorrect reservation of hosts that do not have enough memory. Component mbschd schmod_parallel.so schmod_reserve.so A high priority job reserves resources but cannot run on the host, causing wasted resources P Date When defining "LSB_QUERY_ENH=Y" in lsf.conf and performing several queries, the query child mbatchd might core dump. Child mbatchd core dumps, causing b* query commands to fail.

20 P Date There is an inconsistent return value of a user job because whether the job script process is killed by the SIGXFSZ signal depends on if the job command redirects stdout. This fix introduces the following parameter: LSB_JOB_SCRIPT_TRAP_SIGNALS Configured in lsf.conf. Syntax LSB_JOB_SCRIPT_TRAP_SIGNALS=signal_name... A list of the names of signals that are trapped by the job scripts. This parameter prevents the specified signals from killing the job script process. By default, the job scripts trap the SIGTERM, SIGUSR1, SIGUSR2, SIGINT, and SIGHUP signals, so you do not have to define these signals in this parameter. Because the job scripts cannot trap the SIGSTOP and SIGKILL signals, these values are not valid. Valid values: A space-separated list of signal names. The first 31 signals are valid (from SIGHUP to SIGSYS), except for SIGSTOP and SIGKILL. This parameter is not supported on Windows platforms. Default Undefined. The job script does not trap any additional signals except SIGTERM, SIGUSR1, SIGUSR2, SIGINT, and SIGHUP. Component sbatchd mbatchd keeps jobs in running status and end users cannot use bkill to kill their jobs P Date Scheduling parallel jobs may cause slow mbschd performance. Component mbschd schmod_default.so schmod_parallel.so schmod_reserve.so Lower mbschd performance has a large impact on the job dispatch rate in a cluster.

21 P Date If a krb5 ticket renewal fails, the log messages are not sufficient to assist with troubleshooting. Component Krbrenewd Insufficient warning and error messages makes it difficult to debug problems P Date If a job dependency condition is "ended(jobid)", the dependency is broken when the parent job is requeued. Job dependency is broken in some cases P Date mbatchd does not accept user group names that end with a backslash ("/"). LSF administrators cannot configure user group names ending with a backslash ( / ) P Date When running bpost on an execution cluster running LSF (or newer) with a submission cluster running LSF 7.0.6, the execution cluster mbatchd core dumps if the submission cluster mbatchd connection is lost and then reconnects. mbatchd core dumps and LSF no longer works.

22 P Date After upgrading the LSF cluster to version 9.1.3, the process tracking information for jobs that were still unfinished before upgrading is lost and cannot be recovered. This is because LSF changes the cgroup information file name format, so the old cgroup information files are no longer recognized by LSF Component sbatchd Cannot collect jobs run time usage information after upgrading to P Date When a newly-installed LSF cluster starts up, the master elim may report the following error message in the log file: readloadupdatefromsubelim: Protocol error: loadcnt cannot be read from elim This error message is a false alarm. The root cause is that some elims may start, but quickly exit with ELIM_ABORT_VALUE. A race condition might happen where the master elim reads the exited child elim process before receiving the SIGCHLD signal of the child, in which case the read fails and the master elim displays this error message. Component melim The error message gives LSF administrators concerns about LSF product quality P Date The job run time recorded in the lsb.acct file is incorrect when the job is UNKNOWN and mbatchd is restarted. The incorrect job run time recorded in lsb.acct causing RTM to report incorrect job information.

23 P Date When running SGI MPI jobs under pam, the CPU time report is incorrect. Component pam End users do not receive the correct CPU time usage of their parallel jobs within pam P Date bmgroup takes a long time to show the new dynamic hosts, and it takes a long time (about 10 minutes) before the new dynamic hosts start accepting jobs. It takes a long time for users to know that a dynamic host is ready to use P Date If using a host partition configured with a host group, mbatchd might core dump. mbatchd core dumps and LSF no longer works P Date When the argument to blimits -u or -q is part of the actual user or queue, the actual user or queue will still be shown. This fix restricts the argument, and does not make any expansion. Component blimits blimits -u or -q show some limits information that it should not show.

24 P Date If the job has the span[ptile='!'] resource requirement, but the user who submitted the job did not define MXJ for any host type/model in lsb.hosts, and the user also did not specify a slot requirement for any host type/model in the span[] clause of the job's submission command: - LSF or older versions ignore the span[ptile='!'] resource requirement and treat the job as an ordinary parallel job. - LSF does not ignore the span[ptile='!'] requirement but treats this clause as span[ptile=1]. This fix restores the previous LSF behavior for handling span[ptile='!'] resource requirements. Some jobs are pending even though there are enough resources P Date When mbatchd replays events and there are events that modify an entire job array to run in a large host group, mbatchd replays slowly and takes a long time to restart the cluster. LSF mbatchd is very slow to start up P Date Job execution failed under Ubuntu because /bin/sh is linked to /bin/dash in Ubuntu. Component sbatchd Job fails to start.

25 28092 P Date When killing a parallel job (submitted using blaunch), only the SIGKILL signal is received. Therefore it is hard for users to do job cleanup before the jobs are killed. Component sbatchd res blaunch Users cannot do job cleanup before the job is killed P Date When a line in lsb.users is longer than 4352 and you run badmin mbdrestart, running badmin ckconfig results in errors. However, the bconf command is successful. The lsb.users file cannot contain too many users in one line P Date If LSF_STRIP_DOMAIN is changed in lsf.conf, mbatchd -C may core dump. badmin reconfig does not work P Date bpeek and lsrun logs unnecessary warning messages when LSB_KRB_TGT_FWD=Y is set in lsf.conf to control the Ticket Granting Ticket (TGT) forwarding feature, but no TGT is found. Component bpeek lsrun Misleading error messages.

26 33435 P Date mbatchd may cause a core dump if mbatchd replays chunk job events. Cluster is down P Date When mbatchd fails to create child query mbatchd due to a heavy network load, bjobs hangs indefinitely. A workflow relying on a bjobs query stops working P Date If the run time of a job is longer than seconds (approximately 166 days) then the RUN column width exceeds seven characters. bhist -w does not separate the RUN and USUSP columns in the output (compared to bhist -l output). This results in the two columns being combined together. Component bhist Difficulty in seeing the RUN and USUSP output.

27 33980 P Date When setting up utilization (ut) at the queue level to schedule enough CPU for jobs, LSF should round up ut numbers, resulting in incorrect use of resources. For example, ut = 0.92 is 12 cores in a cluster and ut = 0.94 is 16 cores. When setting ut = 0.92 or 0.94 in lsb.queues, bqueues -l reports ut = 0.9. When setting ut = 0.96 in lsb.queues, bqueues -l reports ut = 1. Component bqueues Incorrect use of resources P Date When the system is too busy to release a port, it will cause the sbatchd restart to fail because the socket failed to initialize. Component sbatchd Compute nodes become unavailable which reduces compute capacity P Date Orphan jobs run when a dependant job gives an improper exit code. For example: Job3 depends on Job2 and Job2 depends on Job1. If Job1 dies unexpectedly, Job2 receives the TERM_ORPHAN_SYSTEM correctly. However, it gives an exit code of zero, which causes Job3 to run anyway. Dependant jobs are allowed to start when they should have been aborted.

28 33973 P Date If mbatchd replays job switch events, mbatchd may core dump bhist libbat.a libbat.so liblsbstream.so liblsf.a liblsf.so lsf.h lsbatch.h The cluster does not work P Date When using bsub -I < file, bsub uses tty as standard input, which is not the correct behavior. Component bsub bsub -I < file does not work as expected P Date CPU time is not correctly calculated by CGROUP when blaunch is used to submit parallel jobs on the first host that runs the job RES. Component sbatchd res The accuracy of the cputime accounting info is not reliable P Date During periods of high query load (and MAX_CONCURRENT_JOB_QUERY is set to attempt to improve it), bjobs attempts to query mbatchd every second. This results in poorer mbatchd performance. Component bjobs Poor mbatchd performance.

29 34564 P Date Queue-level pre-execution and queue-level host-based pre-execution scripts run with different user group memberships. Component sbatchd Host-based pre-execution scripts fail P Date When LSF uses epoll, and License Scheduler receives a duplicate mbatchd registration (for example, after you run badmin ckconfig), the connection between bld and mbatchd is broken. Therefore, License Scheduler will not receive any job information. No LS job related information cannot be sent to BLD, including running jobs and demands for tokens P Date When a preempting job cannot be dispatched due to the guarantee SLA policy, its pending reason is set to job level, which does not prevent similar jobs from scheduling the same session. Component mbschd schmod_default.so Poor performance P Date daemons.wrap logs unnecessary warning messages when restarting the parent sbatchd. Component sbatchd daemons.wrap Misleading error messages.

30 36747 P Date When running interactive jobs, NIOS may exit with exit code 255 even if the job completed successfully. Component res The bsub exit code is given at incorrect times P Date mbatchd sometimes dispatches jobs to unavailable (unavail) hosts after running badmin reconfig. The wrongly dispatched jobs fail P Date When the same job is requeued, then terminated, bacct considers this as two exited jobs for the "Total number of exited jobs" metric, even though the exit condition is for the same job. Therefore, bacct shows an incorrect number of exited jobs. Component bacct Inconsistent data shown for bhist and bacct P Date When running "blimits -w", the values of any limits based on EXTERNAL RESOURCES are truncated even though the command is run in wide mode. Component blimits blimits output is truncated.

31 39162 P Date Dependency on job name is rejected when the dependant job is attached to an empty SLA. Unexpected behavior. Difficult to debug root cause because there is no indication why the job submission is failing P Date When a job name includes special characters such as "%", running the "bjobs -o name" command may fail or display an incomplete job name. Component bjobs bjobs core dump P Date In some cases, LSF does not honor user-assigned priorities. Component schmod_parallel.so schmod_reserve.so mbschd Low priority jobs are dispatched first and block high priority jobs P Date bjobs shows an error message when the number of concurrent bjobs queries exceeds the value of the MAX_CONCURRENT_JOB_QUERY parameter specified in lsb.params. Component bjobs Some jobs may fail.

32 38996 P Date When sbatchd restarts, mbatchd sends a package containing information on running jobs to sbatchd. The calculated size of the package is incorrect, which may lead to a "package full" error. Jobs cannot be scheduled P Date When restarting lim and mbatchd, and submitting some jobs from a float client at the same time, the mbatchd log shows continuous getcommittedruntime error messages. Continuous error messages fill the log P Date The job efficiency calculation (in Platform RTM) is incorrect when the job is automatically re-queued, due to the job's exit code. This is because at the time of the automatic re-queue, the job's runtime is reset to zero but the job's CPU time is not. The job's CPU time must be reset to zero when the job is re-queued. Job runtime and cputime are inconsistent for rerun job

33 44462 P Date When configuring LSF_TMPDIR with a directory that is not /tmp, LSB_CHECK_JOB_PID_REUSE_ON_REBOOT does not work. Component sbatchd LSB_CHECK_JOB_PID_REUSE_ON_REBOOT does not work and the job PID can be reused again, causing LSF to think the job still running P Date When the system is too busy to release a port, it causes the lim/res restart to fail because the socket failed to initialize. Component lim res Failure of lim restart causes a failover. If during this event, there is a job submission, the job submission may fail P Date Setting the smoothing factor in the page rate report to a fixed value is inconvenient. This fix introduced a parameter to control the smoothing factor in the page rate report: Syntax EGO_LIM_PG_SMOOTH_FACTOR = smoothing_factor 0 Specifies the smoothing factor when lim reports the host page rate. The smoothing factor controls how fast reported values converge to an instantaneous value. The smoothing_factor value must be an integer between 0 and 10. If set to 0, no smoothing is applied and the reported value is equal to the instantaneous value. The larger the value, the more time LSF needs to react to page rate change in the host. This parameter is only supported in Linux platforms. Default Set to 4

34 Component lim When the index exceeds its threshold incorrectly, too much idle capacity per compute host is lost P Date When mbatchd dumps pending reasons, mbatchd incorrectly dumps "dumpcondensedpendingreasons as well. dumpcondensedpendingreasons fails in child mbatchd P Date Jobs defined with a memory-only guarantee may remain pending because host slots are used by higher priority jobs. Therefore, memory-only guarantee package pools do not work without a host slot guarantee. Performance slowness and deadlock P Date When the pending job count reaches more than 50,000, the scheduler experiences performance issues until the number of pending jobs decreases. mbschd schmod_default.so schmod_limit.so schmod_preemption.so Job scheduling is slow.

35 The following solutions have been done in LSF Version Fix Pack 3 between 30 th May 2014 and 8 th June 2015: Date Support for the -R option for the brestart command, to let end users change resource requirement of a restarted job. The syntax of the -R option of the brestart command is the same as the -R option of bsub and bmod commands. Component brestart Date Component This solution dumps the contents of the job buckets to a file in order to address the following issues: a) The smaller number of job buckets in the system will shorten the scheduling cycle. b) The total number of job buckets can be shown in the "badmin perfmon view" output. However, there was no easy way to see the job buckets themselves. c) There is no easy way to track down the cause of a large number of job buckets. To generate the dump file containing all the current job buckets in the system, run badmin diagnose -c jobreq. The file contains the job buckets in XML format by default. The default file name "jobreq_<host_name>_<date_and_time>.xml" is used if "-f logfile_name" is not specified. The file location is DIAGNOSE_LOGDIR if configured in lsb.params. Otherwise, the file is in LSF_LOGDIR. bapp badmin bhist bjobs bparams bqueues sbatchd mbatchd mbschd schmod_default.so schmod_parallel.so schmod_fairshare.so schmod_affinity.so schmod_advrsv.so schmod_dc.so

36 Date When using badmin ckconfig, LSF will check the host information from NIS or DNS. If the network is not stable and responds slowly, this process will take a long time, causing mbatchd to stop responding. The following parameter has been introduced in lsb.params: Syntax IGNORE_HOSTNAME_CHECK=Y yes N no If this parameter is enabled, LSF will ignore the checking for host information in NIS or DNS. Default N Date This fix allows LSF users or administrators to use wildcard characters in LSB_JOB_TMPDIR, JOB_SPOOL_DIR, job CWD and job output directories, including the following characters: - LSB_JOB_TMPDIR: %H - JOB_SPOOL_DIR: %H %P, %U, %C, and %JG - Job CWD and output directories: %H For more details on how to use these wild-card characters with LSF working on GPFS, refer to IBM Platform LSF Best Practices and Tips. sbatchd bparams Date Add support to perform logic after a job is submitted by bsub or after a job is modified by bmod. Similar to how esub scripts are run before job-submission or job-modification, espub scripts are run after the operation. Component bsub bmod brestart mesub

37 Date When the LSF_NIOS_PEND_TIMEOUT environment variable is set, interactive jobs cannot be executed after the LSF_NIOS_PEND_TIMEOUT value expires. The job is killed and returns a message such as "Job <xxx> is being terminated". You can use the LSF_NIOS_DIE_CMD environment variable to specify a customized command and output message when the LSF_NIOS_PEND_TIMEOUT value expires. See the following example: user@host1: setenv LSF_NIOS_PEND_TIMEOUT 1 user@host1: setenv LSF_NIOS_DIE_CMD "bkill %J > /dev/null; echo job %J is terminated by bkill;" user@host1: echo $LSF_NIOS_DIE_CMD bkill %J > /dev/null; echo job %J is terminated by bkill; user@host1: bsub -I "echo test" Job <16> is submitted to default queue <normal>. <<Waiting for dispatch...>> job 16 is terminated by bkill About the LSF_NIOS_DIE_CMD environment variable: 1.The default value is "bkill jobid" 2.LSF_NIOS_DIE_CMD supports the %J variable, so you can use the job ID when you specify the custom command for LSF_NIOS_DIE_CMD. Component bsub Date Add support to expand the allremote keyword that appears in the HOST column of the bmgroup output. By expanding allremote, bmgroup displays leased-in hosts from other clusters instead of the allremote keyword. To enable this feature, define LSB_BMGROUP_ALLREMOTE_EXPAND=Y in the appropriate configuration file: To enable "allremote" to be expanded for all users, edit lsf.conf and define LSB_BMGROUP_ALLREMOTE_EXPAND=Y. To only enable "allremote" to be expanded for a specific user, specify LSB_BMGROUP_ALLREMOTE_EXPAND=Y as an environment variable in the user's local environment before issuing the command. bmgroup

38 Date For Red Hat Enterprise Linux (RHEL) version 6.6 Beta and later, there is a MemAvailable area in /proc/meminfo. If there is MemAvailable, read this value directly from /proc/meminfo for the available memory load indicator instead of calculating the value. Component lim Date This enhancement allows the system to kill the job using the most CPU if the average logic CPU r15m value and the UT value both reach a configured threshold on the host. This allows other jobs on the host to run smoothly. A job is considered the worst CPU offending job on a host if it is using the most CPU (system time + user time) for an average assigned slot during the check period. When one job is killed as worst CPU offending job, the exit reason is the same as when a job's normal CPU limit is reached: "job killed after reaching LSF CPU usage limit" This solution is configured through a new configuration parameter in lsf.conf: Syntax LSB_CPU_USAGE_ENF_CONTROL=<Average Logic CPU r15m Threshold>:<UT Threshold>:<Check Interval> 1) Average Logic CPU r15m Threshold: A threshold value for the maximum limit for the quotient of host lsload command' r15m value and the count of host logic CPU. This means the average CPU queue length during the last 15 minutes for one logic CPU on the host. It must be a floating-point number, equal to or bigger than zero (0). For example, 7.8, 2.1, 0.9, and so on. 2) UT Threshold: A threshold for the maximum limit of the host lsload command's UT value. The UT value is the CPU utilization exponentially averaged over the last minute, between 0 and 1. It must be a floating-point number between 0 and 1. For example, 0.4, 0.5, or ) Check Interval: The smallest period of time during which the host's r15m and UT information will not be checked between two close checking cycles. This value must be not less than the value of SBD_SLEEP_TIME and the unit is in

39 seconds. For example, 20, 40, or 60. 4) The host is considered to be in CPU overload when <Average Logic CPU r15m Threshold> and <UT Threshold> have both been reached. 5) This parameter does not affect jobs running across multiple hosts. Default Not defined Component sbatchd Date Component LSF s global fairshare scheduling policy divides the processing power of Platform MultiCluster (MultiCluster) and the LSF/XL feature of Platform LSF Advanced Edition among users to provide fair access to all resources, so that every user can use the resources of multiple clusters according to their configured shares. Global fairshare is supported in Platform LSF Standard Edition and Platform LSF Advanced Edition. mbatchd sbatchd mbschd gpolicyd badmin bgpinfo bqueues schmod_advrsv.so schmod_affinity.so schmod_aps.so schmod_bluegene.so schmod_cpuset.so schmod_craylinux.so schmod_crayx1.so schmod_dc.so schmod_default.so schmod_dist.so schmod_fairshare.so schmod_fcfs.so schmod_jobweight.so schmod_limit.so schmod_mc.so schmod_parallel.so schmod_preemption.so schmod_pset.so schmod_ps.so schmod_reserve.so schmod_rms.so schmod_xl.so libbat.a libbat.so liblsf.a liblsf.so lsbatch.h Date Add support to show the settings for pending time, interactive jobs, exclusive jobs, and run time limit by either running bjobs -o pend_time, bjobs -o interactive, bjobs -o exclusive, bjobs -o runtimelimit/rtlimit or by adding pend_time, interactive, exclusive, runtimelimit/rtlimit to LSB_BJOBS_FORMAT in lsf.conf. For example: bjobs -o jobid pend_time interactive exclusive runtimelimit JOBID PEND_TIME INTERACTIVE EXCLUSIVE RUNTIMELIMIT 1 20 Y N 100.0/host N Y - 1. For a pending job, the PEND_TIME is the current time minus the job s submission time.

40 2. For a dispatched (running and suspending) job, the PEND_TIME is the job s start time minus the job s submission time. 3. For a re-queued, migrated, or rerun job, the PEND_TIME is the current time (re-dispatched time) minus the job s re-queued, migrated, or rerun time. 4. Jobs that are submitted with the following bsub options are treated as interactive jobs. -I, -Ip, -Is, -IS, -ISp, -ISs, -IX. 5. bjobs -o exclusive shows Y for jobs that are submitted with the -x option, a compute unit exclusive request, or an affinity exclusive request. 6. The RUNTIMELIMIT is the merged value of job level run time limit assignment, the application level run time limit setting and the queue level run time limit setting. If ABS_RUNLIMIT is enabled, the RUNTIMELIMIT is not normalized by the host CPU factor. 7. For IBM Platform LSF MultiCluster ("MultiCluster") with the job level run time limit specified, "bjobs -o runtimelimit" shows the normalized run time on both the submission cluster and the execution cluster. Defining the run time limit at the application or queue level in the submission cluster does not affect the job s run time on the job execution cluster, so defining it in the submission cluster is meaningless. However, when defining the run time limit at the application or queue level in the submission cluster, running "bjobs -o runtimelimit" in the submission cluster still shows the combined run time limit of the submission cluster as being different from the effective run time limit at the execution cluster, while running "bjobs -o runtimelimit" in the execution cluster shows the effective run time limit. bjobs Date Improvements to job chunking to address the following issues: A job's running time is not always predicable at the time of its submission. If such jobs are chunked but actually run for a very long time, other jobs in the same chunk are blocked in the chunk and wait for the long running job to finish. There is no way to reschedule these waiting jobs even if there is enough free resources. Traditional LSF job chunking will always chunk jobs together regardless of whether those jobs can run without being chunked. In some scenarios this will impact the resource utilization. In lsb.queues or lsb.applications, configure the new parameter CHUNK_MAX_WAIT_TIME together with CHUNK_JOB_SIZE on some queues or application profiles.

41 Syntax CHUNK_MAX_WAIT_TIME = <seconds> Component If a job is in WAIT status for longer than the configured time period, LSF removes the job from the job chunk and reschedules the job. The LSF scheduler ensures that such jobs are run instead of being chunked as a waiting member again when there are eligible resources. The application profile settings override queue-level configuration. Note: After a chunk job's waiting time exceeds CHUNK_MAX_WAIT_TIME, it may continue in WAIT status for one or more SBD_SLEEP_TIME cycles before being rescheduled. This is because sbatchd checks the timeout periodically, and the checking might be delayed if sbatchd is busy handling requests from mbatchd. In lsb.params, configure the new parameter ADAPTIVE_CHUNKING=Y to enable this feature. Note: This feature is not supported in the backfill and preemption phase in LSF bapp badmin bhist bjobs bparams bqueues sbatchd mbatchd mbschd schmod_default.so schmod_parallel.so schmod_fairshare.so schmod_affinity.so schmod_advrsv.so schmod_dc.so P Date Enhancements to LSF when running in the Linux x64 environment: 1. elim.gpu.ext reports GPU utilization. a) elim.gpu.ext reports the utilization for each GPU on the host. b) elim.gpu.ext reports the average utilization for all GPUs in shared mode on the host. 2. Optimize GPU allocation policies inside sbatchd. a) For exclusive mode GPUs, try to allocate GPUs from the same NUMA node as the cores (best effort) at runtime for affinity jobs. If there are multiple GPUs on multiple PCI buses in one NUMA node, LSF considers PCI bus information after considering affinity between CPU and GPU, then attempts to allocate GPUs from the same PCI bus (best effort). b) For exclusive mode GPUs, try to allocate GPUs from the same PCI bus (best effort) at runtime for non-affinity jobs. c) For shared mode GPUs, allocate shared mode GPUs to jobs by using round robin distribution. 3. Display which GPUs have been allocated to a job via bpost. For more details on configuring this service patch, refer to the README file inside

The following bugs have been fixed in LSF Version Service Pack 2 between 30 th May 2014 and 31 st January 2015:

The following bugs have been fixed in LSF Version 9.1.3 Service Pack 2 between 30 th May 2014 and 31 st January 2015: 211873 Date 2013-7-19 1. When advance reservation files exist (lsb.rsv.id, lsb.rsv.stat),