Linux - HardwareThis forum is for Hardware issues.
Having trouble installing a piece of hardware? Want to know if that peripheral is compatible with Linux?
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
I've been getting hard-disk related errors. It first happened with a 1 year old disk, and I thought it was a hardware failure. However I replaced it with a new one, and I still get errors. Here is an excerpt from /var/log/kern.log:
Sometimes this completely locks up the system. Since this happened with two disks, I suppose it's some other factor. I tried using another IDE cable, and different kernels (2.4.25 and 2.6.5) without success. Finally, I found out that disabling dma (with ide=nodma) makes the problem disappear, but of course with a big performance penalty. The strange thing is also that the disk worked flawlessly for about a week.
How can I discover the culprit? Thanks for any idea or help.
It may be the IDE chipset. You could switch the cable from IDE0 to IDE1 (and make the corresponding changes in /etc/fstab and your bootloader configuration). This only a partial test because IDE0 and IDE1 do have some circuitry in common. But if the problem is in the IDE chipset and in the part of the circuitry dedicated to IDE0 then this swap will show that the problem is in the IDE chipset.
I started having these same type of errors all the time when I upgraded to a stock debian 2.6.x kernel. Prior to my upgrade, I was dead-on stable (no glitches, etc) for a full year!
I've even downloaded HDD mfgr's disk scanning utilities and done their exhaustive diagnostics on all of my disks and come up with no errors.
I'm starting to think that the late 2.4 (and upwards) kernels introduced the culprit....
Like the previous poster, I started getting this error (and a bunch of related ones) on a system that had been formerly stable for several years, and it started the second I installed Fedora Core 3. Before, I had been using the default RedHat 2.4 kernel, and now I'm using the 2.6 kernel. I'm also using reiserfs, and this error has been corrupting the file systems on my second IDE controller quite dramatically. I can't even back the files up.
I found a work around for backing up files while I nail down the source of this problem....
I grabbed an ISO of Knoppix a while ago (version 3.3) and I booted off that Knoppix disc. Then I was able to mount and rsync my important stuff onto other disk drives in case I corrupt my filesystems beyond recovery while trouble-shooting.
I've been playing with re-compiling kernel 2.6.9 but haven't yet been able to nail down which disk option is causing these errors.
Maybe DMA code is changed in new kernels? Maybe it's enabled by default and it wasn't in the older kernels?
My hdparm -I /dev/hde output shows an asterisk next to UDMA5... but does that really mean DMA is on or off? I dunno.....
I'm having similar problems. I'm wondering if it's the driver for a certain ide controller or something more generic. What ide controler are people using? This is mine:
0000:00:10.0 IDE interface: ALi Corporation M5229 IDE (rev c4)
Onboard:
ICH4: IDE controller at PCI slot 0000:00:1f.1
PCI add-in card:
PDC20267: IDE controller at PCI slot 0000:01:05.0
Since my last post, I recompiled dozens of times (literally) from kernel 2.4.22 up through 2.6.10 RC2 and enabling disabling many different options related to IDE. I also swapped controller cards to rule out hardware failure. The best I have come up with was grabbing the .config file from Knoppix and re-compiled 2.6.10 RC2, turning off just enough "options" to get a successful compile.
I'm stable again, but at a serious hit on drive performance.
Timing buffered disk reads: 10 MB in 3.27 seconds = 3.06 MB/sec used to be more like 60 MB in the same 3.27 seconds.
@elfoozo: Do you know at which kernel you started seeing this behavior? I've always used a 2.5/2.6 kernel on this machine and I'm pretty sure this problem was absent before. I do not really remember when it started, because at first I thought it was a failing drive. If it hadn't been a laptop, I probably would have replaced it already...
Success! I re-compiled 2.6.10 and stripped out everything - including module support and have been stable on 2.6.10 for a full week. I'm even running at UDMA 5, disks are zippy again... Life is good.
Oh My Gods.
I went through the Exact same experience as the first poster, with the 1 year old drive, exchange, one week wait etc, except I didn't switch cables because this is on my Toshiba Laptop. I'll try installing that 2.6.10-kernel now. Must I "strip out everything" to make it work? I'm fairly n00 to that stuff...
THANKS so much for the tips.
I should've noticed how FC2test3 worked but newer distros (including Skolelinux) all got messed up. I didn't notice until I managed to do a minimal install w/o my HD breaking down, so that was in a terminal and saw the errors come.
Scratch that. When my laptop is warm, I can't even reformat the drive from my Partition Magic floppies (the longer it's been on, the . I guess I need a new one... although it seems weird to be dying on me after just one week. How sad.
By "strip out everything" I mean: I selected No on every "option" unless it specifically matched my hardware.
I've since recompiled a few more times turning on more kernel features and module support and USB support and still the disk errors are gone. I'm liking 2.6.10 a lot!
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.