Ultra 2/170 2GB root disk faulty?

2007-12-25 9:15:00

Hi all

  The consensus of opinion regarding the problem listed below

(confirming my own thoughts) was that the SCSI chain is simply

overloaded.

  I've re-formatted and analysed the disk, and that shows no problems

whatsoever. The root file system was recovered from tape, and I put

the system back on line. We've had no further problems, but we HAVE

placed an immediate order for a second SCSI controller which will be

arriving in the next few days and should hopefully clear up the

problems for good.

  The range of the replies suggests that there is a great deal of

confusion as to exactly how to manage long SCSI chains. Everyone

agrees that the chain should be as short as possible, and that the

wide drives should be first in the chain, but some issues are just

not well understood.

  1) Does a wide-to-narrow SCSI adapter cable properly terminate the

     'wide' pins which go no further? This was something I was worried

     about from the moment we fitted it, since I suspect not.

  2) Someone suggested we synchronise the RPMs of the disk drives on

     the chain. Was this a serious suggestion?!

  3) Someone said that the speed of the chain is totally dependent

     upon the speed of the slowest device on that chain. I don't think

     this is true, because the SCSI adapter negotiates transfer rates

     with each device entirely separately. (Unlike IDE.)

  4) Someone suggested disabling tagged command queueing, but I don't

     think this would help. All the disks can handle tagged command

     queueing okay with the exception of the CDROM and the Tape drive.

  Thanks to the following for useful input:

   Al Hopper al@logical-approach.com

   Brad Young bbyoung@amoco.com

   Kevin Sheehan u-kevin@megami.veritas.com

   Rich Smith rc.smith@ibm.net

      Matt.


--
# -+- Matthew Reynolds, Contract Research Assistant -+- #
# -+- Aston Space Geodesy -+- #
# Email: reynolmd @ sun.aston.ac.uk Web: http://www.sat.aston.ac.uk/ #
# Phone: +44 (0)121-359-3611 x4552 Fax: +44 (0)121-333-3389 #

Original post:

------- Start of forwarded message -------
Return-path: <reynolmd@sun.aston.ac.uk>
Date: Wed, 20 Jan 1999 13:01:00 GMT
From: Matt Reynolds <reynolmd@sun.aston.ac.uk>
To: sun-managers@codeprof.ececs.uc.edu
CC: Matt Reynolds <reynolmd@sun.aston.ac.uk>, Phil <moorep@sun.aston.ac.uk>
Subject: Ultra 2/170 2GB root disk faulty?

Hi guys,

I'm looking for some advice on the following problem - hopefully
someone can confirm my theories on this.

We've got an Ultra 2/170 workstation with Solaris 2.6 and 105181-04
kernel patch. The internal disk is a 2.1 GB Seagate (I think!).
This machine has a FULL SCSI chain, all external apart from the boot
disk:

c0t0d0 SEAGATE ST32550W SUN2.1G 2G (SCSI II,wide) ****
c0t1d0 SEAGATE ST118273W 18G (SCSI III,wide)
c0t2d0 MICROP 1991-27 1128RQ 9G (SCSI II,wide)
c0t3d0 SEAGATE ST12400N SUN2.1G 2G (SCSI II,narrow)
c0t5d0 MICROP 1991-27 1128RF 9G (SCSI II,narrow)
c0t6d0 Plextor 4x CDROM drive (SCSI I, narrow)
c0t4d0 HP 12GB DAT C1537A (SCSI II,narrow)

The wide disks are first on the chain, and the narrow drives are at
the end. The chain is terminated with a Sun narrow SCSI terminator.
External cabling does not exceed 4m. The internal disk is marked ****
in the above list.

None of the external drives have reported problems to the
system. Yesterday, I received the following error message:

Jan 19 14:54:38 geodesy
: WARNING: /sbus@1f,0/SUNW,fas@e,8800000 (fas0):
: Target 0 reducing sync. transfer rate
: WARNING: /sbus@1f,0/SUNW,fas@e,8800000/sd@0,0 (sd0):
: Error for Command: write(10) Error Level: Retryable
: Requested Block: 1070416 Error Block: 1070416
: Vendor: SEAGATE Serial Number: 02540364
: Sense Key: Aborted Command
: ASC: 0x47 (scsi parity error), ASCQ: 0x0, FRU: 0x3

Jan 19 14:55:51 geodesy
: fas: 0.0: cdb=[ 0x2a 0x0 0x0 0x10 0x54 0x50 0x0 0x1 0x0 0x0 ]
: fas: 0.0: cdb=[ 0x2a 0x0 0x0 0x10 0x56 0x50 0x0 0x1 0x0 0x0 ]
: fas: 0.0: cdb=[ 0x2a 0x0 0x0 0x10 0x57 0x50 0x0 0x1 0x0 0x0 ]
: fas: 0.0: cdb=[ 0x8 0x7 0xf1 0xee 0x2 0x0 ]
: fas: 0.0: cdb=[ 0x2a 0x0 0x0 0x10 0x55 0x50 0x0 0x1 0x0 0x0 ]
: WARNING: /sbus@1f,0/SUNW,fas@e,8800000 (fas0):
: Disconnected tagged cmd(s) (5) timeout for Target 0.0
: WARNING: /sbus@1f,0/SUNW,fas@e,8800000/sd@0,0 (sd0):
: SCSI transport failed: reason 'timeout': retrying command

This disk is now running at 11.430MB/sec SCSI transfer rate, which
is SCSI I,wide. 'scsiinfo' reports that the drive is 'noisy'.

Errors have been reported on this disk previously, but they
coincided with a cooling fan failing in one of the external drives,
causing problems on the SCSI chain, so I assumed this was why the
internal disk was re-syncing at a slower SCSI transfer rate.

These error messages seem to have coincided with a problem with a
system directory. Users discovered that they could no longer compile
software on the system, and I traced this to the following directory:

/usr/ccs/lib:
[snip]
?--------- 0 root root 0 Jan 1 1970 libcurses.a
?--------- 0 root root 0 Jan 1 1970 libform.a
?--------- 0 root root 0 Jan 1 1970 libgen.a
?--------- 0 root root 0 Jan 1 1970 libl.a
?--------- 1 root other 4294967297 Jan 1 1970 libld.so.2
- -rwxr-xr-x 1 bin bin 110696 May 5 1998 liblddbg.so.4
?--------- 0 root root 0 Jan 1 1970 libmalloc.a
?--------- 0 root root 0 Jan 1 1970 libmenu.a
[snip]

Obviously, the directory itself has been overwritten with zeroes.
I'll recover this directory from tape, but my question is (finally):

*********

Could the problem be:

1) the length of the SCSI chain (i.e. get another SCSI card), or is
it more likely to be (as I suspect)

2) a problem root disk (i.e. replace the root disk and the problems
will go away).

Or both...?

*********

Apologies for the length of this posting, but there is a lot of
information which I feel is relevant to this problem.

Thanks in advance,

M. Reynolds (part time sys-admin looking for a job!).

- --
# -+- Matthew Reynolds, Contract Research Assistant -+- #
# -+- Aston Space Geodesy -+- #
# Email: reynolmd @ sun.aston.ac.uk Web: http://www.sat.aston.ac.uk/ #
# Phone: +44 (0)121-359-3611 x4552 Fax: +44 (0)121-333-3389 #
------- End of forwarded message -------

Comments

Got something to say?

You must be logged in to post a comment.