Problem with aspppd(1) on Solaris 2.6

2007-12-25 11:37:00

I apologize for taking such a long time producing this summary, but

the problem proved to be difficult to solve and I have just nailed down

the complete solution.

The original problem:

-------------------------------------------------------------------------------

I have been using a ppp setup successfully on Solaris 2.4, 2.5 & 2.5.1.

I recently upgraded to Solaris 2.6 and started having a problem where

aspppd continues to constantly redial and connect to my ISP.

I have a 5 minute inactivity timeout set which works as expected. However,

10-15 minutes after the link is dropped, my system reconnects. I have been

unable to determine what is causing this, but the result is that I am

racking up huge amounts of totally unnecessary daily connect time. I have

had to resort to manually stopping aspppd and then having to manually

start/stop it each time I need a connection. This is a real pain and I

would like to get the system back to its old reliable self where aspppd

sits quietly until some explicit connection request was made, times out

as expected, and then sits quietly until the next explicit request.

The configuration:

        SPARCstation 20/61

        Running the standard aspppd(1) bundled with Solaris 2.6

        DNS (in.named) not active on this system

        Connecting to worldblazer modem on /dev/ttya

[...more description deleted...]

-------------------------------------------------------------------------------

The solution:

-------------------------------------------------------------------------------

The problem turned out to be a combination of the following two things:

1: Thanks to Partick Bigos <patrick.bigos@sun.com> at Sun Customer Support

    and to Joe Garbarino <jgarb@erim-int.com> for suggesting this solution.

    I started monitoring nscd(1M) activity by uncommenting the "logfile"

    line, setting the "debug-level" to a value to 10 in /etc/nscd.conf,

    and restarting the daemon (cd /etc/init.d ; ./nscd stop ; ./nscd start).

    I also ran snoop(1M) on the connection (snoop -d ipdptp0 | tee logfile).

    What I found was that nscd was generating "keepalive" operations on

    various remote sites which had recently been contacted. This would

    cause DNS connections which necessitated a reconnection of the link.

    Apparently the name service cache daemon (nscd(1M)) has the hosts cache

    enabled by default. In the /etc/nscd.conf file the line:

            # enable-cache hosts no

    is commented out. By removing the hash, and then restarting the daemon

    as discussed above, most of the spurious aspppd connections were halted.

    Why this works is still a mystery to me. If anyone knows why host

    caching is causing keepalive connections to remote hosts, I would really

    appreciate hearing the reasons so I can have a better understanding of

    what nscd(1M) is actually doing. Also, what changes were made between

    Solaris 2.5.1 and 2.6 that caused this problem to show up now?

2: After implementing the above, I notices that there were still a few

    asppp connections occurring when there was no obvious reason. I tracked

    the problem down to a couple of find(1) commands which were being run

    nightly from root's crontab. After many hours of manual investigation,

    I discovered that there was a new Solaris 2.6 directory "/xfn/_x500"

    which I had never noticed before. If you do anything to this directory

    such as ls(1) the contents, it would cause a remote connection. This

    xfn directory is part of the Federated Naming System (SUNWfns) and the

    _x500 subdirectory is probably created by the addition of the FNS Support

    For X.500 Directory Context (SUNWfnsx5) package.

    

    The immediate solution was to stop the find commands from descending

    down into this directory. The better solution may be to remove this

    package if it is not required.

With these two changes, my machine has been sitting quietly for over 36 hours

now, so it appears that the problem is solved.

-------------------------------------------------------------------------------

Credits:

-------------------------------------------------------------------------------

I really want to extend my sincere thanks to the following people for

responding to my request for help. Many of the suggestions were very

useful in getting me pointed in the right directions to finally track

down the ultimate solution. (In particular, I discovered that the

normal output from snoop is *not* the same as the packet data saved to

a file when using the -o flag. This realization turned snoop into a

useful monitoring tool! :-))

    Joe Garbarino <jgarb@erim-int.com>

    Cheryl L. Southard <cld@astro.caltech.edu>

    Jonathan Loh <jloh@futon.sfsu.edu>

    Erwin Fritz <efritz@glja.com>

    John W. Funk <jwf@ccuc.on.ca>

    Daniel R. Falconer <drf@dedalus.net>

    Daniel Kluge <danielk@tibco.com>

    Richard Skelton <rich@brake.demon.co.uk>

    Casper Dik <casper@holland.Sun.COM>

    Martin Huber <hu@garfield.m.isar.de>

    Bob Bridgham <robert_bridgham@b-e-s-t.com>

    David Crane <david.crane@east.sun.com>

    Ken Corum <Sun Customer Support>

    Patrick Bigos <patrick.bigos@sun.com>

-------------------------------------------------------------------------------

Summary of suggestions:

-------------------------------------------------------------------------------

Here, I will quickly summarize the suggestions submitted by the above

people.

Joe Garbarino <jgarb@erim-int.com>

    * Set the "keep-hot-count" entry to 0 in the /etc/nscd.conf. The man

      page says:

        keep-hot-count cachename value

          This attribute allows the administrator to set the number of

          entries nscd(1M) is to keep current in the specified cache.

          value is an integer number which should approximate the number

          of entries frequently used during the day.

      

      This suggestion would probable also work since this would likely be

      equivalent to disabling the cache.

Cheryl L. Southard <cld@astro.caltech.edu>

    * Touch the file /etc/notrouter. As the answerbook says:

        When the machine reboots, the startup script looks for the presence

        of the /etc/notrouter file. If the file exists, the startup script

        does not run in.routed -s or in.rdisc -r, and does not turn on IP

        forwarding on all interfaces configured "up" by ifconfig. This

        happens regardless of whether an /etc/gateways file exists.

    * Add a "default_route" line to the /etc/asppp.cf file.

Jonathan Loh <jloh@futon.sfsu.edu>

    * Turn on verbose aspppd logging by setting "debug_level" to a higher

      value (8 or 9) in /etc/asppp.cf.

Erwin Fritz <efritz@glja.com>

    * Reported having similar problems.

John W. Funk <jwf@ccuc.on.ca>

    * Try disabling routing discovery processes /usr/sbin/in.rdisc and

      /usr/sbin/in.routed.

      

      [I believe the presence of the /etc/notrouter file accomplishes this.]

Daniel R. Falconer <drf@dedalus.net>

    * Try stopping named(1M). [It was already disabled.]

    * Check for DNS resolution requests made to the remote nameserver

      (configured in /etc/resolv.conf.)

    * Consider removing the "domain" line from /etc/resolv.conf to stop

      local machine name lookups from going off site.

    * Setup named just to perform local name resolution and to run in

      debug mode to monitor DNS traffic.

Daniel Kluge <danielk@tibco.com>

    * Use netstat and snoop to track down a hanging TCP-connection, since

      TCP normally sends keepalive packets every 15 min, also if the

      connection has not been shut down correctly.

Richard Skelton <rich@brake.demon.co.uk>

    * Check that the network router discovery daemon is not running

      /usr/sbin/in.rdisc. You can hash it out in the file

      /etc/init.d/inetinit

Casper Dik <casper@holland.Sun.COM>

    * Run snoop on the link; perhaps it's a DNS request or some such?

Martin Huber <hu@garfield.m.isar.de>

    * Add "norip ipdptp0" to the file /etc/gateways. This prevents the

      transmission of rip-information over the ppp line, which could

      cause unwanted connections. [This was already implemented]

    * Try 'snoop ipdptp0' to see what is transmitted over the link.

Bob Bridgham <robert_bridgham@b-e-s-t.com>

    * Look for software regularly checking host, or doing a DNS request

      which will all bring up ppp.

    * Use a sniffer to see what are causing packets to try to go outside

      the machine.

-------------------------------------------------------------------------------

Thanks again to everyone for their help. All of it was greatly appreciated.


--
Jeff Small C. Jeffery Small & Associates (206) 232-3338
jeff@cjsa.com 7000 E Mercer Way, Mercer Island, WA 98040

Comments

Got something to say?

You must be logged in to post a comment.