Linux - NewbieThis Linux forum is for members that are new to Linux.
Just starting out and have a question?
If it is not in the man pages or the how-to's this is the place!
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
On a Slackware 13.37 32 bit system, I have 4 - 2TB SATA drives, and a PATA boot/root drive.
Two of the 2TB SATA drives are changing to some mode, which makes them read-only, and creates input/output errors if they are accessed with NFS. I am not clear as to why this is happening. Looking at dmesg, I see that there is an unhandled error code, and that there may be a lost page write.
If I pull the drives out, and put them in another system, they appear to work fine.
SMART tests are fine.
The power supply is showing good voltages under load.
Reboot brings the drives online. Over time, or use, they go into this reduced mode.
On a Slackware 13.37 32 bit system, I have 4 - 2TB SATA drives, and a PATA boot/root drive.
Two of the 2TB SATA drives are changing to some mode, which makes them read-only, and creates input/output errors if they are accessed with NFS. I am not clear as to why this is happening. Looking at dmesg, I see that there is an unhandled error code, and that there may be a lost page write.
If I pull the drives out, and put them in another system, they appear to work fine.
SMART tests are fine.
The power supply is showing good voltages under load.
Reboot brings the drives online. Over time, or use, they go into this reduced mode.
Pointers?
Using big hard disks you must understand that the days of MBR record is almost overtaken by the exceeding magnitude of disk storage today. There is a Gnu/Linux way of overcoming this limitation. See for your self. Read this and this and many more, about it.
Using big hard disks you must understand that the days of MBR record is almost overtaken by the exceeding magnitude of disk storage today. There is a Gnu/Linux way of overcoming this limitation. See for your self. Read this and this and many more, about it.
Goodluck.
All 4 disks are MBR, and the smaller PATA disk is MBR, and less than a TB.
However, this system worked flawlessly for two years. And I am trying to figure out how and why it has decided to send two of the 5 total drives, into a toes up mode.
I've done several passes of smartctl initiated long tests, and there is no sign of any problem. Replacing the SATA and power cables only seemed to extend the time to failure.
Placing on the of the 2TB drives on a USB adapter, and trying for several hours on another computer yielded no observed errors.
The drive seems to take longer to fail when not accessed, or accessed locally, rather than through NFS. However the data is not totally definitive, and the 'studies' are not exactly controlled.
There are no temperature problems, nor apparent power problems, nor dirt accumulation on the MB/SATA controller.
Any other ideas anyone? Perhaps I should have asked on the hardware forum, but I do not consider this strictly a hardware problem yet.
Placing on the of the 2TB drives on a USB adapter, and trying for several hours on another computer yielded no observed errors.
That suggests a problem with the SATA hardware on the motherboard. You could try the inverse test, replacing one of the problem HDDs with a known good HDD from another system.
That suggests a problem with the SATA hardware on the motherboard. You could try the inverse test, replacing one of the problem HDDs with a known good HDD from another system.
I ordered two 3TB drives, and when they arrive Monday, I will start that process. I have other machines which I can shakedown and burn in the new drives on.
I did find that if I just mount one of the 4 large drives, things last longer before the degradation.
Shuffled drives around, and put two new drives on the system. The conclusion I have is that there is a motherboard problem. Both new drives fail after a period of 30 seconds to an hour after boot. All drives pass SMART long test, without any problem (as read upon reboot).
It's a 775 processor, so I may be SOL finding another MB that meets my requirements.
This is the final report, I promise. I found that the new drives were failing like the old. I found that it was a matter of time before drives failed talking to the MB. I suspected that the SATA hardware on the MB was crapping out on me. I tried heating and cooling that area of the MB to see if I could create the failure more quickly.
Then I swapped out all power splitters, followed by some better SATA cables with locks on the ends. Then things started getting better. I checked power and found that when things were getting flakey, the power draw for the box was below 290W, with a 600W PS. I checked the DC voltages at various points.
After I swapped out all the data cables, things started working better. So I did come contact cleaning, etc.
Then I got it so that the system would run for an hour without any data problems with a SATA drive. Then three hours, and then I fsck'd the file systems. I added the 3TB drives, and will be copying things to them tonight.
My conclusion is that the likely culprit was the SATA cables, which were 2gen cables lacking locks. One, even though it worked better than the others, looked ugly in the connector. There was deformation of the socket that the PCB on the harddrive or the socket on the MB plug into.
Maybe you are right about "the SATA hardware on the MB was crapping out". Maybe it is weak at reading and writing signals and the cable replacing and contact cleaning works have improved the signal transmissions enough to move out of the failure region into the mostly success region. If that's right, a minor degradation of the connections -- which is designed-for in the specification -- will result in failures in the not-distant-enough future.
After chasing possible drive handlings of NCQ and other esoteric issues, I decided to swap out the MB. I had a spare 775 MB, and put it in. Unfortunately it needed DDR3 memory, and I had DDR2 memory already in that system. So I borrowed a DDR3 stick from another system, got it up and running, and the SATA behavior is now rock solid.
Hardware problem resolved. No signs of dirt, damage or anything to the MB, just an internal intermittent failure of something. Flexing the board a little didn't cause a failure, so the probability of it being something like a circuit board feed through is not real high.
Now I need a project for a 775 MB sans SATA. (grin)
I'll order some DDR3 for this system, and find another home for the DDR2 memory I pulled out. I realize I have tons of obsolete memory laying around. I wish it could be melted and made into bars.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.