Problems with Exabyte Tape Drive

2007-12-25 9:35:00

Sorry for the delay in this summary - I wanted to find a resolution before

posting. My original message was:

******

I have been having a heck of a time with our 10 Gig 8mm Tape Dsktop

Storage Pack - it is an Exabyte 8500 attached to a Sparc 20 running 2.5.

It started as a hardware problem, with the drive eating the tapes that I

would put in it. I called hardware support and they came out and replaced

the drive with an identical model. Then, when I put tapes in it, it would

spit them out after 30 seconds, with the error lights on. So, I called

hardware and they replaced the drive again. Now it will accept the tape

and it gives me a ready light, but when I do an mt status for /dev/rmt/0

(its correct device name) I get "/dev/rmt/0: no tape loaded or drive

offline". I have rebooted the machine with a -r to reconfigure and have

checked the device files to see if perhaps they were corrupt, and they

look fine. I then ran through drvconfig and tapes with a sun technician

to no avail. The cable I assume is ok, since it is in the middle of a

scsi chain and the disk drives after it are working just fine. Is that an

incorrect assumption? Nothing changed after the hardware replacement,

just the internal drive, no changes to the target id or anything.

Sun software support says it is definitely hardware, and hardware is

telling me it is definitely software, and neither of them will help me any

further. I am lost and I really need to get some backups done.

*******

SOLUTION:

I should have included in my post some more information about the

situation (such as the fact that probe-scsi-all did indeed show the

device, address numbers, etc), so most of the responses I got were asking

for more information. I did get some very useful information about SCSI

chains and the st.conf file. One noticable comment is that Exabyte is not

a very well-like tape drive! =)

In the end, I still cannot get the tape drive to work with its original

system, so I took it off and attached it to another Solaris box to at

least do remote backups. Since it works fine on the other box (using the

same cabling from the first box), I have pretty much ruled out hardware

and will now concentrate on finding a software solution.

Anyway, thanks to everyone who responded, I really appreciate you all

taking the time to help me out with this problem! Below are all of the

responses I got in case anyone needs them.

Best,

Leslie Forte

------------------------------------------------------------------------

Leslie Forte Reiss Science 238

UNIX Systems Administrator Phone: (202) 687-3108

Academic Computing Services Fax: (202) 687-6003

Georgetown University leslie@georgetown.edu

------------------------------------------------------------------------

RESPONSES:

From: Michael Kohne <mhkohne@moberg.com>:

First off, let me say that I can't stand exabyte. As a company, I don't

think much of them, or their products. We had an exabyte 4mm dat that gave

us a similar series of troubles, which was eventually replaced by them with

an 8mm unit (apparently even they can't get their 4mm units to work right).

I'd personally put more faith in sun's support people knowing what they are

doing than exabyte's.

I don't have specific knowledge of your problem but I'd lay odds on it

being the exabyte. Does scsiinfo or scsiping give anything useful on the

drive? If not, try hooking it to some other type of machine (I actually

like macs for this) and use a good scsi probing utility on it. This can

often reveal information that is hidden behind unix drivers.

Also, I would try getting some spare cables and see if you can make

anything different happen. You should probably also check all the cables

and connectors for problems - it's not that hard to bend those pins, and

depeding on which pin you bend, it might cause interesting problems that

don't manifest the way you'd expect. Try shuffing devices around on the

SCSI bus (as in, change which drive is cabled first).

-------------------------------------------------------------------------

From: Ric Anderson <ric@rtd.com>:

If you have a hardware and software maintenance contract with Sun,

(which it sounds like you do) call in the problem and don't stop

escalating till its fixed. Also cry on your sales persons shoulder.

He or She will be bright enough to see lost sales if this doesn't get

resolved :-)

Of course, if you want to poke about yourself, I'd try the following -

most of which you've probably already done :-)

1. Halt the machine (a crash a day keeps the users away)

2. do a "probe-scsi-all" and make sure the tape DRIVE as well as

    the stacker answer the probe properly, and at the expected SCSI ID.

3. If they don't, power off the tape enclosure and pop the skins on

    the it and verify that the 50 pin ribbon connector is fully seated

    in the drive. Its amazing what a slight angle on that connector

    can foul up.

4. Check the power plug also, at the back of the drive to make sure

    it is fully seated.

5. If the hardware looks fine, go to /dev/rmt and remove all the

    links. Then do a boot -r (or /usr/sbin/drvconfig followed by

    /usr/sbin/tapes).

---------------------------------------------------------------------

From: John Stoffel <jfs@fluent.com>:

Leslie,

My first suggestion is to open up the case holding the drive and to

check and make sure that any DIP switches are set correctly for your

system. This should have been done when they took away the original

drive, but they may have forgotten to do so. Of course this assumes

there are some dip switches.

Then try shutting down the system and checking for the drive with a

'probe-scsi-all' command from the prom level. Does the drive come

back with the same name and rev numbers as before? You might have

gotten a drive with newer/older rev firmware that your system doesn't

understand properly.

What happens if you put a cleaning tape into the drive? Does it make

some whirring noises and then spit it out again? Then try booting the

system (boot -rv) and seeing what 'mt status' says then. Watch the

console carefully as it boots up to make sure it sees the tape drive

properly. The -v flag will give you verbose output.

Go out and buy a DLT tape drive. You'll be happier and I wish I could

do the same with out jukebox.

----------------------------------------------------------------------

From: Frank Pardo <fpardo@tisny.com>:

Previous summaries on this list have talked about the position of

devices on the SCSI chain. You might try removing everything but the

tape drive, as a test. And if that works, try adding other devices one

at a time, both closer to the computer than the tape drive, and outboard

from it.

Again quoting from previous summaries, the difference between active and

passive SCSI terminators can be important. People are always saying to

use active terminators.

This may sound stupid, but... Have you experimented with more than one

tape in the drive? It could be that your test tape is defective...

-----------------------------------------------------------------------

From: "Dan A. Zambon" <dzambon@afit.af.mil>:

Hi,

I am not sure if this will help you or not. I have two tape

stackers (from MTI) on my Sun environment and have had troubles

galore with them. However, when I replace the exabyte drives

inside the stacker, I have to make sure that the CEI numbers

of the new replacement drive match those of the drive

being replaced.

For example, on my 10 tape stacker I just replaced the EXB-8505

drive. The CEI number on the drive is 870010*025. This number

(except for the 025 part) must be exactly the same for both drives.

If not, the mt (or tar) commands do not recognize the device.

I hope this helps - and I wish you all the luck....

---------------------------------------------------------------------

From: Mark Hargrave <root@wisdom.maf.nasa.gov>:

Have you tried a different brand of tapes?

---------------------------------------------------------------------

From: ssayer@aisys.com:

Have you checked to make sure that this device is not internally terminated or that the SCSI bus itself is incorrectly terminated in some manner?

----------------------------------------------------------------------

From: Jay Lessert <jayl@latticesemi.com>:

[horror story clipped...]

My sympathies.

1) We *have* run into two bad 8500 refurb jobs in a row. It's possible.

2) We have run into "weak" power supplies on exb-10 stackers (the

    older ones, without the LCD display). One time we ended up running

    a stacker for a year with:

    - the cover off,

    - the stacker mechanism running off the internal power supply

    - the tape drive itself running off an external power supply

    It was the only way we could make it work, and we tried

    *everything*.

3) Just because the drive is in the middle of the SCSI chain doesn't

    mean the SCSI chain is ok (one or both of the internal SCSI

    connectors could be completely open, for example). So if you

    haven't done it yet, rewire the SCSI chain completely, or move

    the stacker to another cpu and try it there with a another cable.

----------------------------------------------------------------------

From: Jim Harmon <jim@telecnnct.com>:

What you didn't mention here are the following things:

        IS the drive Fast SCSI, FastWide SCSI, active, passive?

        How long is your entire SCSI chain, SE? Differential?

        How many SCSI devices are ON the chain? 3? 4? 5? 6?

        What is the ADDRESS of the tapedrive? (I assume it's 5 or 6

        by default)

        Is your kernel configured to support the tape drive on an

        address other then 5 or 6?

Any of these issues could impact your problem. :)

-------------------------------------------------------------------------

From: Bob Woodward <bobw@kramer.filmworks.com>:

check your /kernel/drv/st.conf file. I'm running Solaris 2.4 on a Sparc 20

and though I'm using a DLT tape, I still have the settings for the Exabyte

drives. I'll include the relevant entries from my file so you can check

them against what's in yours:

(stuff at the top of the file.......)

        "EXABYTE EXB-2501", "Exabyte 2501 MiniQIC", "WtQIC",

        "EXABYTE EXB-2502", "Exabyte 2502 MiniQIC", "WtQIC",

        "EXABYTE EXB-8205", "Exabyte 8205", "Exa8200c",

        "EXABYTE EXB8500C", "Exabyte 8500c", "Exa8500c",

        "EXABYTE EXB-8505", "Exabyte 8505", "Exa8500c",

        "EXABYTE IBM-8505", "IBM Flavor 8505", "Exa8500c",

        "EXABYTE IBM-85XL", "IBM Flavor 8505XL", "Exa8500c",

(more stuff in the middle of the file.....)

Exa8200c= 1,0x35,0,0xd639,4,0x14,0x14,0x14,0x90,1

Exa8500c= 1,0x35,0,0xd639,4,0x14,0x15,0x15,0x8c,1

WtQIC = 1,0x32,512,0xc40a,1,0x00,0;

(a bunch more other stuff at the bottom of the file.....)

The comma's at the end of the first section are important and the last

'group' entry for the second section should end with a semicolon. (The

WtQIC line is the last entry of the parameter listings in my file.)

Hope this helps a little bit. If need be, I can probably email the whole

file to you if you think that will help.

Of course, I'm getting ready to go for the weekend so I won't be back until

Tuesday. Good luck.

---------------------------------------------------------------------------

From: White Gary SrA USAFE CSS/SCOE <Gary.White@ramstein.af.mil>:

Ensure the target ID of the tape drive does not conflict with any of

your other devices. Do a probe-scsi in PROM mode to ensure all devices

are unique and are being seen.

Gary White

Comments

Got something to say?

You must be logged in to post a comment.