SUMMARY: Programs hang that access /etc/.name_service_door?
2007-12-24 23:23:00
Thanks to everyone who replied to my question.
The winning Kudo goes to Doug Winter for correctly spotting that the cause is
the following line in /etc/ssh/ssh_prng_cmds:
"arp -a -n" /usr/sbin/arp 0.02
The arp command is just one of many commands OpenSSH runs to gather entropy
for it's randomizer, and this particular command was hanging. The -n command
isn't supported under Solaris, and there were a number of hosts in the arp
table that didn't have a name in reverse dns. (In fact, this problem was
already reported on the archived openssh-unix-dev mailing list... Silly me.)
Commenting out the offending line from /etc/ssh/ssh_prng_cmds did the trick.
My ssh connect time went from 1:30 to 0:08.
Thanks!
Michael
> Hello all,
>
> I have just upgraded ssh on my Solaris 8 system and everyting works
> wonderfully except on three systems. On these three systems ssh hangs for 37
> seconds when trying to ssh from one of these three systems to anywhere else.
> I believe I have tracked this problem down, but I don't understand the cause.
>
> Using truss (with a rather extreme set of options: -f -a -e -l -d -tall -vall
> -xall -sall -mall -rall -wall -uall) I see the following:
>
> 18785/1: 1.4638 open64(0xFF226358, 0) = 6
> 18785/1: 0xFF226358: "/etc/.name_service_door"
>
> ...
>
> 18750/1: 1.5779 close(6) = 0
> 18785/1: door(6, 0xFFBED430) (sleeping...)
> 18750/1: waitid(0, 18785, 0xFFBED748, 03) (sleeping...)
> 18785/1: 38.0478 door(6, 0xFFBED430) = 0
> 18785/1: 38.0481 door(6, 0xFFBED4C8) = 0
>
> If I am reading this right, the timstamps show that between the close(6)
> (timestamp 1.5779) and the second door(6, 0xFFBED430) (timestamp 38.0478),
> there's a 37 second delay. According to the manual page for truss, the
> timstamps signify the completion of the command, which means, if I am correct,
> that the cause is that second door(6, 0xFFBED430) on /etc/.name_service_door.
>
> This only happens on these three hosts. So my question is:
>
> (1) is my diagnosis correct in thinking that the problem is with
> /etc/.name_service_door?
>
> and
>
> (2) what uses /etc/.name_service_door?
>
> (This is confusing my (l)users into thinking that there's something wrong with
> their account, and they're griping to me about it.) I'd like to restart
> whatever service is causing the slowdown, but I don't know what it is, and
> there is no mention of .name_service_door in any of the Answerbook2 libraries
> or man pages. (I thought I would be slick and look at the inode of
> /etc/.name_service_door and then look for that inode in /proc/*/fd/*, but
> there are an awful lot of programs that have something open to that door!)
>
> Any ideas anyone? Should I just "punt" and reboot them?
>
> Thanks for your input,
>
> Michael Peek
>
>
> Michael Peek peek at tiem.utk.edu
> ------------------------------------------------------------------------------
> Systems Administrator / C++ Database Programmer 569 Dabney Hall
> Department of Ecology and Evolutionary Biology Knoxville, TN 37996-1610
> University of Tennessee at Knoxville
> ------------------------------------------------------------------------------
> (865)974-0224 phone, (865)974-3067 fax http://www.tiem.utk.edu/~peek
Comments
Got something to say?
You must be logged in to post a comment.

