hung socket

2007-12-25 9:21:00

**** summary:

I received a couple of me-too's on this one, but no solutions to the

problem. The product vendor has since my original posting acknowledged

that there is a problem and are working on a fix for it.

**** thanks to:

Andrew Foote <acf@nabaus.com.au>

Jacques Rall <jacques.rall@za.eds.com>

Marc S. Gibian

***** answers:

> From: gibian@stars1.hanscom.af.mil

>

> I've been away from the office so I don't know if you've sent out a summary yet.

> Anyway, so far as I know, the only recovery path for a hung socket is a reboot.

> Let me add that hung sockets are not all that uncommon when I've run unattended

> ufsdumps over the LAN. This is why I strongly advise against use of backup

> products that use the OS' underlying tools for the actual tape handling. Some

> argue that they want to be able to restore without first installing the backup

> tool on a crashed system. My position is that you spend so much more time on the

> dump side that the slight overhead during recovery is far outweighed by the

> added reliability during dump.

>

> Hope this helps,

> Marc S. Gibian

> Telos Consulting Services phone: (617) 377-6350

> PRISM/TFS email: gibian@stars1.hanscom.af.mil

> From: Jacques Rall <jacques.rall@za.eds.com>

> What about using pmadm or sacadm? (sorry, don't know any switches)

>

> ----------

> From: ACF

>

> Me too !!

>

> I however am running proxy backups under AIX b/w RS/6000's. Like you,

> the only method I've found to "reset" the socket is by killing all

> associated processes.

>

> PDC do need to work on this as it's pretty dirty.

> Pls let me know how you go,

>

> Rgds,

> Midrange Services.

**** original question:

   SUN Sparc20 running Solaris2.5 with the 2.5 recommended patches installed.

   Problem description:

   This machine is a dedicated backup server that runs the PDC Budtool product

   This product uses remote shelled dump/restore to backup the client

   machines. There appears to be a bug that gets "activated" when one of the

   backup clients either hangs or crashes while a dump is being run. The

   backup server keeps the socket connection open to the client that was being

   backed up. This socket will stay open until I manually kill the parent

   process on my backup server that initiated the remote dump.

   I'm working with the backup product vendor on a fix for this problem, but

   was hoping in the meantime to find a way to close this socket without

   killing the parent backup process. When I kill the parent process, none of

   the backups that still remain in the "backup schedule" will get run and the

   summary of the backup schedule will not get generated. I guess my basic

   question is: shouldn't a socket get closed when the destination machine is

   no longer accessible (e.g. no longer ping-able)?

   Attached is some info that will hopefully clarify my problem description.

   All of the commands have been run from the backup server (of course, since

   the client is accessible!):

   backupsvr: lsof | grep client

   goserver 2095 root 11u inet 0xf611fec0 0t5 TCP backupsvr:1020->client.bms.com:shell

   backupsvr: netstat -a | grep client

   backupsvr.1020 client.bms.com.shell 61315 0 8760 0 ESTABLISHED

   backupsvr: ping client 1

   no answer from client.bms.com

   (NOTE:# the goserver is the "parent" process which controls the backup schedule

   and initiates the remote dump command)

   backupsvr: /usr/ucb/ps auxw | grep "goserver -x"

   root 2095 0.0 4.3 3272 2676 ? S Jan 05 286:19

   /usr/budtool/bin/solaris_sparc/goserver -x0

   backupsvr: truss -aef -p 2095

   2095: psargs: /usr/budtool/bin/solaris_sparc/goserver -x0

   2095: getmsg(12, 0xEFFF87F8, 0xEFFF87EC, 0xEFFF8804) (sleeping...)

   Any info on how to try and close this socket without killing the "goserver"

   process would be appreciated. Thanks!

   --

   Christopher M. Murphy email: murphy@bms.com

   Bristol Myers Squibb phone: (609) 252-5741

   Scientific Information Systems fax: (609) 252-6163

   Princeton NJ


--
Christopher M. Murphy email: murphy@bms.com
Bristol Myers Squibb phone: (609) 252-5741
Scientific Information Systems fax: (609) 252-6163
Princeton NJ

Comments

Got something to say?

You must be logged in to post a comment.