Determining the Health of a Customers system.

2007-12-24 23:53:00

I occasionally have course to telnet into one of our customers Sun systems...

Usually the system is...
* Literally on the far side of the world.
* Operating in real time with hundreds or thousands of
the customer's customers depending on it.

What can I do to determine the health or otherwise of the system while I'm in there?

My personal checklist is...

Check diskspace...
df -k

Check loading "top" looking at %idle, %kernel time. (I regard <10% idle and >20%kernel as red flag conditions).

Sort "top" by size (osize) and check on size of application, has it been growing etc.

Check for fd leaks with
/usr/proc/bin/pfiles pid

Check for funny things happening in the network world...
netstat -a

Lots of connections in a TIME_WAIT state being a red flag condition to me. (This is not a web server app).

Check for funny things happening to the hardware...
dmesg | less
Arcane mutterings about Bad Things happening to SCSI devices or Memory being red flag conditions.

Run "last" to see if unexpected logins are occuring at odd times.

cd /var/log
and check syslog for untoward happenings.

cd /var/adm
and check messages for Bad Things.

Check the application's directory for core files.

What other things to do and/or read can Older and/or Wiser heads suggest?

--

John Carter Phone : (64)(3) 358 6639
Tait Electronics Fax : (64)(3) 359 4632
PO Box 1645 Christchurch Email : john.carter at tait.co.nz
New Zealand

Good Ideas:
Ruby - http://www.ruby-lang-org - The best of perl,python,scheme without the pain.
Valgrind - http://developer.kde.org/~sewardj/ - memory debugger for x86-GNU/Linux
Free your books - http://www.bookcrossing.com

Comments

Got something to say?

You must be logged in to post a comment.