LinuxQuestions.org
Download your favorite Linux distribution at LQ ISO.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 11-12-2012, 09:27 AM   #1
linuxbird
Member
 
Registered: Feb 2006
Distribution: Slackware
Posts: 418

Rep: Reputation: 20
Disc goes read-only


On a Slackware 13.37 32 bit system, I have 4 - 2TB SATA drives, and a PATA boot/root drive.

Two of the 2TB SATA drives are changing to some mode, which makes them read-only, and creates input/output errors if they are accessed with NFS. I am not clear as to why this is happening. Looking at dmesg, I see that there is an unhandled error code, and that there may be a lost page write.

If I pull the drives out, and put them in another system, they appear to work fine.

SMART tests are fine.

The power supply is showing good voltages under load.

Reboot brings the drives online. Over time, or use, they go into this reduced mode.

Pointers?
 
Old 11-12-2012, 10:58 AM   #2
malekmustaq
Senior Member
 
Registered: Dec 2008
Location: root
Distribution: Slackware & BSD
Posts: 1,621

Rep: Reputation: 447Reputation: 447Reputation: 447Reputation: 447Reputation: 447
Quote:
Originally Posted by linuxbird View Post
On a Slackware 13.37 32 bit system, I have 4 - 2TB SATA drives, and a PATA boot/root drive.

Two of the 2TB SATA drives are changing to some mode, which makes them read-only, and creates input/output errors if they are accessed with NFS. I am not clear as to why this is happening. Looking at dmesg, I see that there is an unhandled error code, and that there may be a lost page write.

If I pull the drives out, and put them in another system, they appear to work fine.

SMART tests are fine.

The power supply is showing good voltages under load.

Reboot brings the drives online. Over time, or use, they go into this reduced mode.

Pointers?
Using big hard disks you must understand that the days of MBR record is almost overtaken by the exceeding magnitude of disk storage today. There is a Gnu/Linux way of overcoming this limitation. See for your self. Read this and this and many more, about it.

Goodluck.
 
Old 11-12-2012, 11:14 AM   #3
catkin
LQ 5k Club
 
Registered: Dec 2008
Location: Tamil Nadu, India
Distribution: Debian
Posts: 8,576
Blog Entries: 31

Rep: Reputation: 1195Reputation: 1195Reputation: 1195Reputation: 1195Reputation: 1195Reputation: 1195Reputation: 1195Reputation: 1195Reputation: 1195
Have you tried swapping the SATA signal cables around to eliminate a bad cable as the cause?
 
Old 11-12-2012, 01:29 PM   #4
linuxbird
Member
 
Registered: Feb 2006
Distribution: Slackware
Posts: 418

Original Poster
Rep: Reputation: 20
Quote:
Originally Posted by catkin View Post
Have you tried swapping the SATA signal cables around to eliminate a bad cable as the cause?
Yes, I swapped out two of them, as the existing ones were long, and I had two shorter newer ones that have locks on them.
 
Old 11-12-2012, 01:31 PM   #5
linuxbird
Member
 
Registered: Feb 2006
Distribution: Slackware
Posts: 418

Original Poster
Rep: Reputation: 20
Quote:
Originally Posted by malekmustaq View Post
Using big hard disks you must understand that the days of MBR record is almost overtaken by the exceeding magnitude of disk storage today. There is a Gnu/Linux way of overcoming this limitation. See for your self. Read this and this and many more, about it.

Goodluck.
All 4 disks are MBR, and the smaller PATA disk is MBR, and less than a TB.

However, this system worked flawlessly for two years. And I am trying to figure out how and why it has decided to send two of the 5 total drives, into a toes up mode.

Thanks for the links on GPT, etc.
 
Old 11-14-2012, 09:27 PM   #6
linuxbird
Member
 
Registered: Feb 2006
Distribution: Slackware
Posts: 418

Original Poster
Rep: Reputation: 20
I've done several passes of smartctl initiated long tests, and there is no sign of any problem. Replacing the SATA and power cables only seemed to extend the time to failure.

Placing on the of the 2TB drives on a USB adapter, and trying for several hours on another computer yielded no observed errors.

The drive seems to take longer to fail when not accessed, or accessed locally, rather than through NFS. However the data is not totally definitive, and the 'studies' are not exactly controlled.

There are no temperature problems, nor apparent power problems, nor dirt accumulation on the MB/SATA controller.

Any other ideas anyone? Perhaps I should have asked on the hardware forum, but I do not consider this strictly a hardware problem yet.

Thanks.
 
Old 11-15-2012, 08:42 AM   #7
catkin
LQ 5k Club
 
Registered: Dec 2008
Location: Tamil Nadu, India
Distribution: Debian
Posts: 8,576
Blog Entries: 31

Rep: Reputation: 1195Reputation: 1195Reputation: 1195Reputation: 1195Reputation: 1195Reputation: 1195Reputation: 1195Reputation: 1195Reputation: 1195
Quote:
Originally Posted by linuxbird View Post
Placing on the of the 2TB drives on a USB adapter, and trying for several hours on another computer yielded no observed errors.
That suggests a problem with the SATA hardware on the motherboard. You could try the inverse test, replacing one of the problem HDDs with a known good HDD from another system.
 
Old 11-15-2012, 09:24 PM   #8
linuxbird
Member
 
Registered: Feb 2006
Distribution: Slackware
Posts: 418

Original Poster
Rep: Reputation: 20
Quote:
Originally Posted by catkin View Post
That suggests a problem with the SATA hardware on the motherboard. You could try the inverse test, replacing one of the problem HDDs with a known good HDD from another system.
I ordered two 3TB drives, and when they arrive Monday, I will start that process. I have other machines which I can shakedown and burn in the new drives on.

I did find that if I just mount one of the 4 large drives, things last longer before the degradation.

Thanks.
 
Old 11-20-2012, 08:31 PM   #9
linuxbird
Member
 
Registered: Feb 2006
Distribution: Slackware
Posts: 418

Original Poster
Rep: Reputation: 20
Shuffled drives around, and put two new drives on the system. The conclusion I have is that there is a motherboard problem. Both new drives fail after a period of 30 seconds to an hour after boot. All drives pass SMART long test, without any problem (as read upon reboot).

It's a 775 processor, so I may be SOL finding another MB that meets my requirements.
 
Old 11-21-2012, 09:25 PM   #10
linuxbird
Member
 
Registered: Feb 2006
Distribution: Slackware
Posts: 418

Original Poster
Rep: Reputation: 20
This is the final report, I promise. I found that the new drives were failing like the old. I found that it was a matter of time before drives failed talking to the MB. I suspected that the SATA hardware on the MB was crapping out on me. I tried heating and cooling that area of the MB to see if I could create the failure more quickly.

Then I swapped out all power splitters, followed by some better SATA cables with locks on the ends. Then things started getting better. I checked power and found that when things were getting flakey, the power draw for the box was below 290W, with a 600W PS. I checked the DC voltages at various points.

After I swapped out all the data cables, things started working better. So I did come contact cleaning, etc.

Then I got it so that the system would run for an hour without any data problems with a SATA drive. Then three hours, and then I fsck'd the file systems. I added the 3TB drives, and will be copying things to them tonight.

My conclusion is that the likely culprit was the SATA cables, which were 2gen cables lacking locks. One, even though it worked better than the others, looked ugly in the connector. There was deformation of the socket that the PCB on the harddrive or the socket on the MB plug into.

I need to find more locking style SATA cables.
 
Old 11-22-2012, 04:13 AM   #11
catkin
LQ 5k Club
 
Registered: Dec 2008
Location: Tamil Nadu, India
Distribution: Debian
Posts: 8,576
Blog Entries: 31

Rep: Reputation: 1195Reputation: 1195Reputation: 1195Reputation: 1195Reputation: 1195Reputation: 1195Reputation: 1195Reputation: 1195Reputation: 1195
Maybe you are right about "the SATA hardware on the MB was crapping out". Maybe it is weak at reading and writing signals and the cable replacing and contact cleaning works have improved the signal transmissions enough to move out of the failure region into the mostly success region. If that's right, a minor degradation of the connections -- which is designed-for in the specification -- will result in failures in the not-distant-enough future.
 
Old 11-25-2012, 08:33 AM   #12
linuxbird
Member
 
Registered: Feb 2006
Distribution: Slackware
Posts: 418

Original Poster
Rep: Reputation: 20
Post note.

After chasing possible drive handlings of NCQ and other esoteric issues, I decided to swap out the MB. I had a spare 775 MB, and put it in. Unfortunately it needed DDR3 memory, and I had DDR2 memory already in that system. So I borrowed a DDR3 stick from another system, got it up and running, and the SATA behavior is now rock solid.

Hardware problem resolved. No signs of dirt, damage or anything to the MB, just an internal intermittent failure of something. Flexing the board a little didn't cause a failure, so the probability of it being something like a circuit board feed through is not real high.

Now I need a project for a 775 MB sans SATA. (grin)

I'll order some DDR3 for this system, and find another home for the DDR2 memory I pulled out. I realize I have tons of obsolete memory laying around. I wish it could be melted and made into bars.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
"short read (fsck)" on disc access (failed command: READ FPDMA QUEUED) estellnb Linux - Hardware 3 02-22-2012 05:27 PM
Can't read disc? tennisbum Linux - Newbie 3 01-04-2012 06:41 AM
Cant Read Disc + I Need Image hitmen Linux - Newbie 3 07-10-2011 11:02 AM
read only hard disc laleh Linux - Hardware 1 10-30-2006 06:40 AM
Grip will not read disc oneandoneis2 Linux - General 4 01-06-2005 04:54 PM


All times are GMT -5. The time now is 01:29 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration