LSF 3.0 and Solaris 2.6 load indices
2007-12-24 19:24:00
using LSF 3 for batch job support. As LSF relies on the system load indices
when selecting jobs and hosts I though I would start there. Immediately
I've come across a problem to which I can find no answer in the docs, the
LSF web page or from Sun.
lsinfo is supposed to return the values of the system load indices ie
$ lsload
HOST_NAME status r15s r1m r15m ut pg ls it tmp swp mem
irwell ok 8.1 7.8 7.7 100% 2e+03 59 0 1461M 1463M 107M
Now, as I understand Solaris the LSF values for ut, it and mem are always
going to be of little use. Any free memory on a solaris system will be used
to buffer disk IO and will always drift towards 100% used; it (idle time)
will always be close to zero when a solaris system is IO bound as iowait is
not counted as idle time; ut (cpu utilisation) again is reported differently
to many Unix systems and in the example above the 24 CPU system had only
about 30% CPU utilisation at the time lsload reported 100% I also have
concerns about swp (free swap) -- the system had 7G of active swap in a
system with only 7G of memory at the time of the above report and so
starting any more large jobs would be a bad idea based just on the available
free swap space.
So for solaris, the default LSF load indices are mostly useless straight out
of the box.
Now, LSF allows the administrator to override the builtin load indices and
define site specific load indices that are better fitted to the local
conditions. What I find surprising is neither Sun (who recommend LSF) nor
Platform Computing (who sell LSF) have any OS specific recommendations.
Has anyone been through this process before and willing to share their
recommendations on using LSF load indices?
Summary will follow
Thanks,
--
/\ Geoff. Lane. /\ Manchester Computing /\ Manchester /\ M13 9PL /\ England /\
"Bother", said Pooh, as he deleted his root directory.
Comments
Got something to say?
You must be logged in to post a comment.

