Linux - Hardware This forum is for Hardware issues.
Having trouble installing a piece of hardware? Want to know if that peripheral is compatible with Linux? |
Notices |
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Are you new to LinuxQuestions.org? Visit the following links:
Site Howto |
Site FAQ |
Sitemap |
Register Now
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
|
|
08-18-2006, 06:32 AM
|
#1
|
Senior Member
Registered: Nov 2005
Location: Belgium
Distribution: Red Hat, Fedora
Posts: 1,515
Rep:
|
SCSI bus hangs on Fedora Core 3
Occassionally, I see a lot of Abort operations, device reset operations and scsi bus reset operations, that all time-out.
The only SCSI device I have is a Promise Ultratrak RAID device.
The messages (taken from dmesg) look something like this:
sym0:1:0: ABORT operation started.
sym0:1:0: ABORT operation timed-out.
and they seem to be coming from the kernel (according to /var/log/messages).
After a few such messages, the disk becomes unusable. Any attempt to access it (ls, find, cd, etc) just hangs. ps shows that processing are hanging (presumably waiting on IO).
Attempts to unmount the disk fail (umount says "device is busy"). Shutting down the machine ("shutdown -r now") doesn't work either.
To resolve the issue, I now do a hard shutdown (ie power button) on the machine, shut down the RAID device, restart it and then reboot the FC3 computer. After that, all returns to normal.
My kernel is 2.6.9-1.667
lspci reports:
02:08.0 SCSI storage controller: LSI Logic / Symbios Logic 53c1010 Ultra3 SCSI Adapter (rev 01)
Any ideas on how to fix the issue? Is this bug fixed in later kernel versions? Or is my RAID device to blaim?
|
|
|
08-19-2006, 10:20 PM
|
#2
|
Senior Member
Registered: Jan 2005
Location: Manalapan, NJ
Distribution: Fedora x86 and x86_64, Debian PPC and ARM, Android
Posts: 4,593
|
The symptom you're report is typical of a failing hard drive. Make sure you have a good backup.
You may also want to apply some maintenance; the current kernel for FC3 is: 2.6.12-1.1381_FC3
Last edited by macemoneta; 08-19-2006 at 10:25 PM.
|
|
|
08-21-2006, 02:17 AM
|
#3
|
Senior Member
Registered: Nov 2005
Location: Belgium
Distribution: Red Hat, Fedora
Posts: 1,515
Original Poster
Rep:
|
Thanks for the response, macemoneta. I'll try the maintenance first. If that doesn't help, I'll try to get my
hands on new disks.
Are there any risks involved in upgrading from the 2.6.9 kernel to the 2.6.12 kernel?
For instance, do I need to upgrade gradually (ie first to 2.6.10, then 2.6.11 then 2.6.12)?
|
|
|
08-21-2006, 07:59 AM
|
#4
|
Senior Member
Registered: Jan 2005
Location: Manalapan, NJ
Distribution: Fedora x86 and x86_64, Debian PPC and ARM, Android
Posts: 4,593
|
There is never a requirement for intermediate upgrade, and there are no risks involved in the process. When you upgrade the kernel, the old kernel remains available (you will be given the choice at boot in the grub menu). If you encounter a problem, you can reboot and select the old kernel.
yum -y update kernel kernel-devel
|
|
|
08-22-2006, 02:02 AM
|
#5
|
Senior Member
Registered: Nov 2005
Location: Belgium
Distribution: Red Hat, Fedora
Posts: 1,515
Original Poster
Rep:
|
OK. I'll give that a go, macemoneta and post back if I have some results.
Thansk again.
|
|
|
08-25-2006, 02:31 AM
|
#6
|
Senior Member
Registered: Nov 2005
Location: Belgium
Distribution: Red Hat, Fedora
Posts: 1,515
Original Poster
Rep:
|
The kernel upgrade has been done. After the latest SCSI hang, I decided to reboot with the new kernel to see if the problem is fixed.
Unfortunately, MySql is now broken. It won't start and it lists a pthread_create error as the cause.
pthread_create seems to be a standard C function, so it may be part of glibc or something like that. Could it be that the new kernel can't work with my old C lib and I need to upgrade that lib too? I'm a bit hesitant to upgrade my C libraries, since this may very well break my processing.
|
|
|
08-25-2006, 07:37 AM
|
#7
|
Senior Member
Registered: Jan 2005
Location: Manalapan, NJ
Distribution: Fedora x86 and x86_64, Debian PPC and ARM, Android
Posts: 4,593
|
You appear to be caught between a rock and a hard place. You'll need to decide whether the problem is severe enough that you are willing to upgrade your entire system (yum -y update) or if you want to live with the current situation.
Another alternative would be to try to find a SCSI controller that doesn't exhibit a problem at your current software level.
|
|
|
08-25-2006, 08:13 AM
|
#8
|
Senior Member
Registered: Nov 2005
Location: Belgium
Distribution: Red Hat, Fedora
Posts: 1,515
Original Poster
Rep:
|
Indeed. I'm caught between upgrading the entire system (unknown impact) and a disk that fails occasionally (it used to be around once every few months, but now it's up to once a week).
It's probably the best solution to get a new disk drive (the increasing frequency may indicate a more fundamental hardware problem).
Unless this "yum -y update" method is relatively painless?
Just to add to the confusion: normally, I only get the SCSI bus messages (ie ABORT, BUS RESET, etc) I posted above and the disk is inaccessible (any attempt to access it freezes the terminal window).
Yesterday, on the other hand, the system did encounter similar SCSI problems yesterday (= after installation of new kernel, but still running under the old one), but the disk came back "up" afterwards, in the sense that my programs didn't freeze while they were accessing the disk. Instead, a bunch of error messages came up telling me that the disk was "read-only" and that my program's attempts to write files to it failed;
Could the kernel install have caused this change in behaviour, even though the new kernel wasn't used?
|
|
|
08-25-2006, 07:53 PM
|
#9
|
Senior Member
Registered: Jan 2005
Location: Manalapan, NJ
Distribution: Fedora x86 and x86_64, Debian PPC and ARM, Android
Posts: 4,593
|
The kernel install could not have changed the behavior, if it was not used. It's likely that you are seeing multiple failures in that case, some of which are recoverable by your current kernel.
I obviously don't know your physical environment, but is it possible that the equipment is running hot? I've seem overheated drives/controllers give an assortment of failures. That you go so long between failures seems suspicious.
|
|
|
08-28-2006, 02:19 AM
|
#10
|
Senior Member
Registered: Nov 2005
Location: Belgium
Distribution: Red Hat, Fedora
Posts: 1,515
Original Poster
Rep:
|
As for physical location, the computer is in a special room, with air-conditioning.
However, this airco can be easily shut down and it is a bit suspicious that a similar computer, in the same room, also faced disk issues around the same time. So you may be on to something with the heating.
I'll try to look into the issue a bit further and will post back if I find anything.
|
|
|
All times are GMT -5. The time now is 12:16 PM.
|
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.
|
Latest Threads
LQ News
|
|