LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Hardware
User Name
Password
Linux - Hardware This forum is for Hardware issues.
Having trouble installing a piece of hardware? Want to know if that peripheral is compatible with Linux?

Notices


Reply
  Search this Thread
Old 10-12-2004, 08:06 AM   #1
RX100
LQ Newbie
 
Registered: Oct 2004
Location: Belgium
Distribution: Fedora Core 2 kernel 2.6.5-1.358
Posts: 4

Rep: Reputation: 0
Due to time out, SCSI bus reset and device offline


Hello,

I am using a Linux server (Fedora Core 2) and am experiencing problems with a SCSI library.

Due to an unknown SCSI problem, a timeout occurs on the SCSI and the Linux server tries to fix the problem performing the following steps:
1. aborts the SCSI command that has timed out
2. attempts to reset a SCSI device
3. attempts to reset a SCSI bus reset
4. attempts to reset a SCSI host bus adapter

If all these fail, the device is then set "offline" which means that it is no longer accessible.

I tried the sg_reset command to bring the device accessible but it does work. The device remains offline.
The only working "reset" is to reboot the server.

Does someone know if there is a possibility to have the device back accessible (online) without having to reboot the server.

Thanx
 
Old 10-12-2004, 10:04 AM   #2
mritch
Member
 
Registered: Nov 2003
Location: austria
Distribution: debian
Posts: 667

Rep: Reputation: 30
what hardware (controller + drive) is this happening on?

sometimes drives need a long time to actually respond to a reset so maybe raising the bus reset delay can help.

if your kernel is compiled with scsi logging support you're able to raise logging verbosity by issuing a command to the driver - have a look into your driver dokumentation/source code about this.

sl mritch.
 
Old 10-13-2004, 03:10 AM   #3
RX100
LQ Newbie
 
Registered: Oct 2004
Location: Belgium
Distribution: Fedora Core 2 kernel 2.6.5-1.358
Posts: 4

Original Poster
Rep: Reputation: 0
Hi,

The hardware is a standard off-the-shelf Server with an Adaptec AHA-2944UW driving an IBM 3583 library.

The timeout error itself does not worry me too much.
What worries me is the fact that when a timeout occurs, the device goes offline and becomes unavailable. I haven’t found yet how to bring the device back online with a command/program.

Currently I must reboot the Server, which is not very funny when the problem happens over the weekend and my backups are not performed.

I designed a driver to drive the library and when a timeout error occurs, the library becomes unavailable. Any reset I tried does not re-enable the device.

After a timeout, here is what happens (the device is still there but it is unavailable):

# cat /proc/scsi/scsi
Attached devices:
Host: scsi0 Channel: 00 Id: 06 Lun: 00
Vendor: IBM Model: ULT3583-TL Rev: 2.80
Type: Medium Changer ANSI SCSI revision: 02

I have a little program that sends a TetUnitReady to the device:
testscsi <dev> <LUN> <command> where the command is a TestUnitReady

# ./testscsi /dev/sga 0 0

[E] Unable to open SCSI device /dev/sga, errno=6 (scsilib/SCSIOpenDevice).
[E] Unable to connect to device /dev/sga (testscsi/main).

I get back an error 6 which means the following:

#define ENXIO - 6 - No such device or address

According to some Linux doc I found on internet linux forums (http://www.linuxforum.com/linux-scsi/x215.html), putting the device offline may be a normal behaviour, but it does not help me.

Anyway, if someone has a “workaround” to reset the device online without rebooting, this would be great.

RX100
 
Old 10-13-2004, 09:51 AM   #4
mritch
Member
 
Registered: Nov 2003
Location: austria
Distribution: debian
Posts: 667

Rep: Reputation: 30
has this just started or have you set up the system lately? is it random? does it continue spinning?
you can try the scsi-tools to down/up the disk, but as even a host reset doesn't get the drive back online i dont think it will be any help.

check/change cables helped me a few times now solving scsi-probs.
if modern drives get too hot they are able to offline themself.
anyway iirc errorcode 9 would be timeout - don't know about 6.
i'd suggest to enable verbose scsi logging and check if you can get more info on this issue.

sl mritch.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Bus Error (core dumped) due to SIGBUS signal rajendra.badapanda Programming 10 07-19-2005 10:18 AM
Bus Error (core dumped) due to SIGBUS signal rajendra.badapanda Linux - Software 1 07-05-2005 12:10 PM
wg511 on mandrake10 - device reset time out kvichak Linux - Wireless Networking 2 08-23-2004 04:04 PM
SCSI bus has been reset ...hangs... Thaidog Linux - Newbie 2 02-20-2004 04:06 PM
Scsi bus re-scan tmoorman Linux - Hardware 2 01-07-2004 10:33 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Hardware

All times are GMT -5. The time now is 09:19 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration