LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Hardware (http://www.linuxquestions.org/questions/linux-hardware-18/)
-   -   Brand new hdd is about to die already? Plus questions about partitioning and backup (http://www.linuxquestions.org/questions/linux-hardware-18/brand-new-hdd-is-about-to-die-already-plus-questions-about-partitioning-and-backup-793692/)

the dsc 03-06-2010 07:24 PM

Brand new hdd is about to die already? Plus questions about partitioning and backup
 
I'm having some creepy error messages, and GSmartControl suggests me to backup my data, due to some of them (soft read error rate being the most "alarming" one, for some others it's saying that the value of the attribute is non-zero, but there's no "official" SMART warning yet, but still says there is risk of future data loss).

This is the sort of kernel message I get:

Quote:

[24071.057018] ata1.00: cmd 25/00:20:ab:c5:42/00:00:25:00:00/e0 tag 0 dma 16384 in
[24071.057018] res 51/40:00:ac:c5:42/40:00:25:00:00/e0 Emask 0x9 (media error)
[24071.057018] ata1.00: status: { DRDY ERR }
[24071.057018] ata1.00: error: { UNC }
[24071.073851] ata1.00: configured for UDMA/133
[24071.073864] ata1: EH complete
[24073.766579] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[24073.766583] ata1.00: BMDMA stat 0x24
[24073.766586] ata1.00: cmd 25/00:20:ab:c5:42/00:00:25:00:00/e0 tag 0 dma 16384 in
[24073.766587] res 51/40:00:ac:c5:42/40:00:25:00:00/e0 Emask 0x9 (media error)
[24073.766589] ata1.00: status: { DRDY ERR }
[24073.766590] ata1.00: error: { UNC }
[24073.799383] ata1.00: configured for UDMA/133
[24073.799397] sd 0:0:0:0: [sda] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE,SUGGEST_OK
[24073.799399] sd 0:0:0:0: [sda] Sense Key : Medium Error [current] [descriptor]
[24073.799403] Descriptor sense data with sense descriptors (in hex):
[24073.799404] 72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00
[24073.799409] 25 42 c5 ac
[24073.799412] sd 0:0:0:0: [sda] Add. Sense: Unrecovered read error - auto reallocate failed
[24073.799415] end_request: I/O error, dev sda, sector 625132972
[24073.799434] ata1: EH complete
Right now there is none (after fsck), but just a while ago it was the second time I got it in two or three days.

Can this all be some funny misunderstanding, or is it time to stop denying that my hdd is in its last days on this Earth? I've just read in this forum that it's very common that even new hdds fail, that's what is most likely to fail on a new pc... I'm a bit boggled, somewhat skeptical, however, because before I was using almost ancient hdds (40 and 80 GB sized) that never gave this sort of error (I'm still using them in an older pc).

I was looking through smartmontools pages, and there was some instructions to "manually" fix some stuff. Anyone knows if this is just the same thing that fsck would do automatically anyway, or something "better", that may turn to show that the hdd isn't doomed after all, that the most scary messages are false-positives? (Oh, the blind hope).



The hdd is a Samsung HD502HI, the failing partition is ext3.



And about partitioning, can it have any effect on this sort of thing?

The way I've done is the following:
  • 1 - sda1 - primary ntfs partition
  • 2 - sda2 - extended partition, containing:
    • 2a - sda5 (don't ask me about 3 and 4) - linux-swap
    • 2b - sda6 - ext3
    • 2c - sda7 - ext3
    • 2d - sda8 - ext3

The first two ext3 are for filesystem roots, to ease distro/release hopping, the remaining one is /home.

I don't understand much about primary/extended and what it means, I just know that it wasn't possible to have them all as primary, according to the partitioner I've used (from linux), as I've been warned when I first tried it. Is this partitioning scheme something that is not recommendable, health-wise?



Finally, if this is all really bad, and I have to replace the hdd as I'm afraid I'll have to, can I still back up it all as it is -- including the installation of the operational systems -- with something like dd in that situation, or this is not advisable?



Thanks a lot.

jlinkels 03-06-2010 07:32 PM

if hard disks fail, it is in the very beginning of their life, or after a few years. (Although I have some which have been spinning for 15 years 24/7)

You should not preclude a hard disk failure because it is only a few days old, on the contrary.

jlinkels

bret381 03-06-2010 08:13 PM

I too have seen drives fail within the first week of being installed. It is very possible that is what is happening

jschiwal 03-06-2010 08:28 PM

The failure rate on hard drives looks like a bath tub. Especially on the left hand side. Higher failure rate in the first few months, it drops down and then starts increasing years later.

A hard drive that moves around a lot, such as in a hard drive may have errors occur more frequently as a result of handling. These errors are often handled in the background with bad blocks being replaced.

This message doesn't look good: auto reallocate failed
It wasn't able to replace the bad blocks.

syg00 03-06-2010 08:52 PM

Your partitioning scheme is fine - the 4 primary limitation is inherited from DOS. Windows is similarly constrained - which is why the Linux tools are like they are.
New schemes are coming to relieve the bottleneck - gpt.

dd is never a good choice of backup - more so on a (potentially) flakey disk - it doesn't tell you of error conditions. Choose a filesystem aware backup - rsync, cp -a ..., something like fsarchive might be the go as it CRCs the data (haven't tried it myself). Or clonezilla ...
For the NTFS, use Windoze to back it up, or ntfsprogs to manually handle it - moving a Windoze system partition is always a problem.

the dsc 04-24-2010 09:52 PM

Thanks, everybody. I've run the samsung diagnostic tool, which indeed confirmed the problem after the long surface test. I've replaced it sometime ago, and the new one has been running ok since then. :)

abefroman 04-24-2010 10:51 PM

Quote:

Originally Posted by bret381 (Post 3888694)
I too have seen drives fail within the first week of being installed. It is very possible that is what is happening

I second this, they could be bad out of the box, go back in a day or two, go bad in a month, 2 months, 1 year, 10 years etc etc. There is no way to tell.


All times are GMT -5. The time now is 06:28 PM.