SCSI errors

2007-12-25 11:48:00

SUMMARY OF PROBLEM:

SCSI tagged queuing cmd timeout errors (see orignal post below).

ATTEMPTS AT A SOLUTION:

 o Disable tagged queuing (TQ) for the entire system:

   -Add this to /etc/system.

         set scsi_options=0x80

Temporarily turning off TQ provided a quick solution, but subtantially

degraded performance. Solaris also sent 10 Warning messages to politely

acknowledge that TQ was disabled.

 o Throttle the number of TQ commands:

   -Add this to /etc/system:

        forceload: drv/esp

        set sd:sd_max_throttle=10

This reduced the number of allowed TQ commands, but I still received

timeout errors.

 o Disable TQ for a specific target or controller:

   -Add this to /kernel/drv/esp.conf for a specific target

         name="esp" parent="/iommu@f,e0000000/sbus@f,e0001000/espdma@f,400000"

         reg=0xf,0x800000,0x40

         target1-scsi-options=0x58

         scsi-options=0x178;

This option was supposed to turn off TQ for the specific target, but it

turned off TQ for the entire controller. I tried tweaking it, but I wasn't

able to turn off TQ for a specific target. I don't know if it was my poor

kung-fu or the extent of the problem.

BOTTOM LINE:

   1. One of the disks did not seem to properly support tagged queuing.

Turning off TQ on the entire controller was necessary to support this

not-fully-SCSI-2 disk.

   2. The DAT drive (which I didn't suspect at first) is having SCSI-level

hardware trouble. To say it another way, this DAT causes the same TQ

errors on *other targets* when attached to my test system (a SS10).

KUDOS TO:

David Schiffrin <daves@adnc.com> (thanks for the resend, too)

Joel Lee <jlee@thomas.com>

Sanjay Srivastava <sanjays@netcom.com>

bismark@alta.Jpl.Nasa.Gov (Bismark Espinoza)

At 2:35 PM -0500 11/12/97, Mark C. Farone wrote:

>Hello, all.

>

>I have a SparcStation20 running Sol2.5.1, primarily as a host for Sybase

>SQL Server.

>

>Periodically for the past 2 weeks, when writing into the raw disk used by

>Sybase at c1t5d0s5, I get this message:

>

>Nov 12 13:42:23 sun3 unix: WARNING:

>/iommu@f,e0000000/sbus@f,e0001000/dma@0,8100

>0/esp@0,80000 (esp1):

>Nov 12 13:42:23 sun3 unix: Disconnected tagged cmds (8) timeout for

>Target

>5.

>Nov 12 13:42:24 sun3 unix: 0

>Nov 12 13:42:24 sun3 unix: WARNING:

>/iommu@f,e0000000/sbus@f,e0001000/dma@0,8100

>0/esp@0,80000/sd@5,0 (sd20):

>Nov 12 13:42:24 sun3 unix: SCSI transport failed: reason 'timeout': re

>Nov 12 13:42:25 sun3 unix: trying command

>Nov 12 13:42:25 sun3 unix: WARNING:

>/iommu@f,e0000000/sbus@f,e0001000/dma@0,8100

>0/esp@0,80000/sd@5,0 (sd20):

>Nov 12 13:42:25 sun3 unix: SCSI transport failed: reason 'reset': retr

>Nov 12 13:42:25 sun3 unix: ying command

>

>

>What I have tried:

> 1. Upgraded to new harddisks (which I had planned to do anyway).

> 2. Tried new cables.

> 3. Tried new active terminators.

> 4. Tried reseating the card.

> 5. Tried putting the disks on another controller (c0). In this case, I

>get the same error, just specific to c0. In fact, I moved everything off c1

>and put them all on c0 (which, btw is where the / fs lives).

>

>For what it's worth, currently I have a DAT drive at c1t4, and harddisks at

>c0t3, c1t1 and c1t5.

>

>It appears that regardless of the controller, disks, or cables, I get this

>error which points to the raw disk used by Sybase.

>

>An impact of this problem is that Sybase blocks all other spid's until the

>SCSI times out (between 1-2 minutes!) during which the spid is unkillable.

>

>Of course, this isn't happening on any other machines with exactly the same

>hardware and software setup.

>

>Thanks for *any* help.

>

>--

>Mark C. Farone Hooked on fishin'

>Systems Analyst, Gainesville Sun Not drugs.

>farone@gvillesun.com


--
Mark C. Farone Why read when you can
Systems Analyst, Gainesville Sun Just sit and stare at things?
farone@gainesville.fl.us -schwa

Comments

Got something to say?

You must be logged in to post a comment.