Running out of ptys

2007-12-25 7:23:00

Folks,

        I'd like to take a little time to describe a problem that was

troubling us for a couple of months until we finally tracked it down

yesterday afternoon.

        We have a Sun 4/470 running SunOS 4.1_PSR_A. About two months ago,

we finally had enough users logging in that we started to run out of pseudo

terminals. No problem, right? Just make a few more. My boss did just

that, he made a few more...and a few more...and more...

        Unfortunately, makeing the new ptys didn't help at all. After

about 35 or so users were logged in, subsequent users logging in were

getting the message ``all network ports in use''. or ``rlogind: out of

ptys''. We know something was wrong here, since we knew there were plenty

of ptys (96 at the time.) So, after some digging aroung in the old

sun-managers messages, we had attributed the problem to the old pty disease

where there were problems in the way some ptys were released, causing them

to hang. We tried and tried to get Sun to help us. The only thing

we heard from them was that the ``pty disease'' problem was supposedly

fixed after SunOS 4.0. They were of no other help. They essentially

denied that there was a problem by saying ``We've never heard of that

problem.'' And I never heard back from them.

        Well, yesterday, my boss (Phil Draughon) and I deceided it was time

to figure out what was going wrong. A few days ago, I had suggested that

we try increasing the value of MAXUSERS in the kernel config file. While

neither Phil or I could think of a reason that it would help, we

thought we'd try it anyway. It was set at 32, we increased it to 128.

Before building the new kernel, I had been using netstat -m to keep an eye

on things and had noticed that when the number of queues (under streams

allocation) reached the maximum value displayed (200) that's when you could

no longer log in, you'd get the ``out of ptys'' error. Kill a few

processes to get the number of queues below 200 and you could log in again.

Seemed strange. It turned out to be a wild goose chase. We installed and

rebooted with the new kernel and nothing changed...we had the exact same

problems, except this time, the number of queues went up to 203.

        The next thing we thought of was the possibility of having too many

files opened. We checked the kernel and deceided that the values were

plenty high enough.

        Because of Phil's previous experience with the problem, we knew

that the open()s in telnetd and rlogind were failing. Our question was

``WHY?!!?''. Unfortunately, whenever telnetd and rlogind have an open()

fail, they incorrectly assume that the reason for the open() failing is

that you're out of ptys. This is not -always- the case. In out case, we

had 48 more. So, we started trying to open some of the ptys. When we

tried to open anything above /dev/ptyrf, the 48th pty, we were rather

suprised to get the error ``No such device or address''. Great, time to go

digging into the kernel again. Well, after a bit of digging aroung, we

found it. In the file /sys/os/tty_ptyconf.c there is this conspicuous

line:

#define NPTY 48 /* crude XXX */

        We couldn't believe it! Sun actually hard-coded into the kernel

the maximum number of ptys you could have, and they didn't even make it a

configurable option. At the very least, this should be in the config file.

After sever minutes of cursing Sun under out breath, we upped it to 128 and

rebuilt a new kernel and rebooted. Problem solved.

        I can't believe Sun wasn't able to tell us how to fix this. With a

little more research, we found that this is carried over from the old BSD

code, but under BSD they set it to 32.

        I must say, we're rather disappointed with Sun's software support

after this little fiasco.

===============================================================================

Christopher D. Nims Chris_Nims@nwu.edu

Distributes Systems Services

Academic Computing and Network Services: Northwestern University

Comments

Got something to say?

You must be logged in to post a comment.