Each host in a Windows NT LSF cluster runs several services, which coordinate the operation of LSF. These services include the Load Information Manager (LIM), which provides information about host load to the LSF system, the Remote Execution Server (RES), which allows programs to be executed on a host in the cluster, and the Slave Batch D@ae{mon} (sbatchd), which coordinates the execution of batch jobs on a host in the cluster. The HPVM front-end server interacts with these d@ae{}mons by utilizing functions exposed in an Application Programming Interface (API) provided by Platform Computing for the LSF system. The API for dealing with the LSF Base system, which consists of the LIM and the RES, is provided by the `lslib.lib' library. The API for dealing with the LSF Batch system, which consists of the sbatchd, is provided by the `lsblib.lib' library. Both libraries are part of the LSF Base system software distribution.
Because the HPVM front end utilizes functions provided by the LSF system, it is important to ensure that LSF is functioning properly before attempting to use the HPVM front end. When troubleshooting problems with the HPVM front end, it is often useful to try submitting a test batch job to the LSF cluster by using the LSF `lsgrun' and `bsub' commands. If the test batch job does not execute properly on the LSF cluster, consult the troubleshooting sections of the LSF Administration manuals.
Go to the first, previous, next, last section, table of contents.