Determining the Health of a Customers system.
2007-12-24 23:53:00
Usually the system is...
* Literally on the far side of the world.
* Operating in real time with hundreds or thousands of
the customer's customers depending on it.
What can I do to determine the health or otherwise of the system while I'm in there?
My personal checklist is...
Check diskspace...
df -k
Check loading "top" looking at %idle, %kernel time. (I regard <10% idle and >20%kernel as red flag conditions).
Sort "top" by size (osize) and check on size of application, has it been growing etc.
Check for fd leaks with
/usr/proc/bin/pfiles pid
Check for funny things happening in the network world...
netstat -a
Lots of connections in a TIME_WAIT state being a red flag condition to me. (This is not a web server app).
Check for funny things happening to the hardware...
dmesg | less
Arcane mutterings about Bad Things happening to SCSI devices or Memory being red flag conditions.
Run "last" to see if unexpected logins are occuring at odd times.
cd /var/log
and check syslog for untoward happenings.
cd /var/adm
and check messages for Bad Things.
Check the application's directory for core files.
What other things to do and/or read can Older and/or Wiser heads suggest?
--
John Carter Phone : (64)(3) 358 6639
Tait Electronics Fax : (64)(3) 359 4632
PO Box 1645 Christchurch Email : john.carter at tait.co.nz
New Zealand
Good Ideas:
Ruby - http://www.ruby-lang-org - The best of perl,python,scheme without the pain.
Valgrind - http://developer.kde.org/~sewardj/ - memory debugger for x86-GNU/Linux
Free your books - http://www.bookcrossing.com
Comments
Got something to say?
You must be logged in to post a comment.

