SUMMARY: Login Problem - No more logins

2007-12-24 19:33:00

Thank you to all who replied.
We could not do anything to solve the problem and we had finally to
reboot the server.

Here is some more post mortem info before the reboot:

1. # ls -l `which login`
-r-sr-xr-x 1 root bin 29512 Sep 9 1998 /usr/bin/login

2. # uptime
9:36am up 36 day(s), 22:43, 19 users, load average: 1.04, 1.46, 1.30

3. # more /etc/system
...
set pt_cnt=128
set npty=128

set rlim_fd_max = 4096
set rlim_fd_cur = 1024

4. Every login attempt was unsuccessful but it was register as a
successful one in /var/adm/wtmp,
that is all logins shown as "still logged in" were actually frozen,
users could not login,
after typing their password the session was waiting forever a login prompt:

# last | head

dgabel pts/51 yvrbk Wed Mar 14 09:28 still logged in
dgabel pts/50 yvrbk Wed Mar 14 09:27 still logged in
dgabel pts/49 yvrbk Wed Mar 14 09:26 still logged in
root pts/48 mail Wed Mar 14 09:26 still logged in
kdesrosi pts/46 172.22.2.216 Wed Mar 14 09:23 still logged in
wlau console Wed Mar 14 09:21 still logged in
kdesrosi pts/45 172.22.2.216 Wed Mar 14 09:19 still logged in
root pts/44 mail Wed Mar 14 09:09 still logged in
wlau pts/42 mail Wed Mar 14 09:08 still logged in
wlau pts/39 172.16.4.102.gt. Wed Mar 14 09:06 still logged in

5. No user could login, both for user ID in the local /etc/passwd and in
the NIS

6. At a login attempt if the user typed a wrong password the login
process could sense it and
reply with "Login incorrect":

{remote_host}: # telnet int
Trying 172.17.2.3...
Connected to int.gt.ca.
Escape character is '^]'.

SunOS 5.6

login: mgreene
Password:
Login incorrect

7. ftp was still working without error from any remote host

8. NFS server was still running without problem, filesystem could be
shared from the system
and remotely NFS mounted.

Here is a sample from /var/adm/messages:
Mar 14 10:58:25 int mountd[405]: MOUNT: bkup mounted /u01
Mar 14 11:00:00 int mountd[405]: MOUNT: arka mounted /u01
Mar 14 11:00:01 int mountd[405]: MOUNT: dev01 mounted /u01
Mar 14 11:09:24 int mountd[405]: UNMOUNT: bkup unmounted /u01
Mar 14 11:10:30 int mountd[405]: UNMOUNT: arka unmounted /u01

9. There was no filesystem full and no message in /var/adm/messages or
/var/log/syslog
for any kind of error - kernel parameter, system processes, filesystem,
devices, ...

Here are the replies:

Rasana Atreya <rasana_atreya at hotmail.com>
----------------------------------------------
Hi Marco,

Could it be an NIS problem? Or maybe NFS? Is the home directory of the
user trying to login mounted from elsewhere - maybe the mount failed?
Can root login? Can root telnet/ftp in? Can the problem user login
through the console/telnet in/ftp in? Did you make any changes before
the problem started?

Does /var/adm/messages point to anything?

Rasana

"Perrier,Kent - PLANO" <kent.perrier at Oneco.net>
-----------------------------------------------------
What about the amount of memory and swap? How man pty's are there?
Is there any messages in /var/adm/messages or syslog?

Kent

David Evans <David.J.Evans at oracle.com>
--------------------------------------------
Check how many ttys are in use. You probably need to raise the tty
count and this is in the FAQ for this group and Caspar Dik's Solaris FAQ.

The who and ps commands will return different data, use the ps command.

The other item is you may have a stale NFS file handle. This will be
apparent by the logs and users complaining of slow response with a
final message of either a machine has timed out or they have a stale
file handle.

Or you may have a network/routing issue.

So there are a few things to check.

David

Marco,

I'd change my last email with your follow-up.

Truss the login and dt deamons (use the -f option to follow any
child processes) and do a snoop at the same time. This will probably
show a request thats waiting for a return and at that point you'll
have an idea of what's happening. But the why may be harder to find.

David

"Fletcher, Joe" <joe.fletcher at Metapack.com>
-----------------------------------------------
Possibilities are max number of processes reached, run out of terminals (if
someone logs out can another use log in?), run out of something else. What
do ps, vmstat, df etc suggest? Is this machine a NIS client and if yes is
the connection to the server ok?

Thomas Knox <knoxth at cch.com>
Is your NIS/YP server OK?

Cian O'Sullivan <ciano at parthus.com>
---------------------------------------
Marco,

you maybe out of telnet sessions.

Check to see the number of seesions open right now.

you can do this by typeing who at the command prompt.

If you are near 48, then you need to add more.

this is done by adding

pn_cnt=XXX where XXX is the number of connections.

Add this to the /etc/system file. You must reboot with the -r option.

reboot -- -r

Or you can touch /reconfigure and then do a reboot.

Both will do the same.

Good luck

Cian

Jim Taylor <Jimt at allconnect.com>
------------------------------------
Could be lots of different things, but try this...

Check to see number of processes on the machine. Only 14
users including you, so it is unlikely. But, some applications
can go rogue and spawn processes until they can't spawn any
more. Especially home-grown. Is this a development box?

Do a ps -ef|wc -l to get # processes on box.

Good Luck,

Jim Taylor

Doug Otto <doug.otto at npawest.com>
----------------------------------------
check his .login/.profile/.cshrc (depending on his shell) it may be a
script error

Ravi Channavajhala <ravi.channavajhala at csfb.com>
You may have run out of ptys. Refer to solaris2 faq.

-ravi

Derrick Daugherty <derrick at tachyon.pointone.com>
-------------------------------------------------------
comments inline

It's rumored that around Wed, Mar 14, 2001 at 12:41:19PM -0500
Marco Greene <marco-greene at home.com> wrote:

> It seems that nothing is wrong.
> It is very strange that every atemp to login is shown in the "last"
> messages as a
> successful login:
> > dgabel sessions are frozen ones - that is he could not login but it
> shows like successfully:
> > int:/# last | head
> > dgabel pts/51 yvrbk Wed Mar 14 09:28 still logged in
> dgabel pts/50 yvrbk Wed Mar 14 09:27 still logged in
> dgabel pts/49 yvrbk Wed Mar 14 09:26 still logged in

what does a `w` show? does he have a shell running? It's possible the
shell isn't able to spawn.. in that case, what are the perms on login?

ls -l `which login`

If you truss -f -o failed-logins -p <pid of telnetd> and then try to
login then kill the truss..that 'failed-logins' file should
show where it's having the issues. Feel free to attach the file. I may
be leaving shortly but i'll try and look at it tonight or tomorrow.

curiosity, what does netstat -nf inet |wc -l return?

> Here are some more details from a session where I'm still logged in:
> int:/# ps -ef | grep inet
> root 353 1 1 Feb 05 ? 192:49 /usr/local/sbin/rinetd

just to make sure...this doesn't have any rules for port 23 does it?

> root 21163 1 0 09:25:30 ? 0:00 /usr/sbin/inetd -s

use that pid for the turss

and `df -e /` just to make sure...

hope that helps,
-derrick

Here is my original post:

> Hello Admins,
>
> We have a login problem right now on one of our servers - SUN/Solaris
> 2.6.
>
> It does not allow any new logins - that is who was already logged in
> it's okay, he can work
> without any problem.
>
> If any new user tries to login he cannot - after typing username and
> password the login window is waiting forever.
>
> Is there a known bug? or is there a patch that is needed?
>
> The system is up for:
>
> 8:31am up 36 day(s), 21:39, 14 users, load average: 1.55, 1.41, 1.31
>
> Any help will be appreciated.
>
> thanks in advance,
> Marco

Comments

Got something to say?

You must be logged in to post a comment.