I'm having some creepy error messages, and GSmartControl suggests me to backup my data, due to some of them (soft read error rate being the most "alarming" one, for some others it's saying that the value of the attribute is non-zero, but there's no "official" SMART warning yet, but still says there is risk of future data loss).
This is the sort of kernel message I get:
Quote:
[24071.057018] ata1.00: cmd 25/00:20:ab:c5:42/00:00:25:00:00/e0 tag 0 dma 16384 in
[24071.057018] res 51/40:00:ac:c5:42/40:00:25:00:00/e0 Emask 0x9 (media error)
[24071.057018] ata1.00: status: { DRDY ERR }
[24071.057018] ata1.00: error: { UNC }
[24071.073851] ata1.00: configured for UDMA/133
[24071.073864] ata1: EH complete
[24073.766579] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[24073.766583] ata1.00: BMDMA stat 0x24
[24073.766586] ata1.00: cmd 25/00:20:ab:c5:42/00:00:25:00:00/e0 tag 0 dma 16384 in
[24073.766587] res 51/40:00:ac:c5:42/40:00:25:00:00/e0 Emask 0x9 (media error)
[24073.766589] ata1.00: status: { DRDY ERR }
[24073.766590] ata1.00: error: { UNC }
[24073.799383] ata1.00: configured for UDMA/133
[24073.799397] sd 0:0:0:0: [sda] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE,SUGGEST_OK
[24073.799399] sd 0:0:0:0: [sda] Sense Key : Medium Error [current] [descriptor]
[24073.799403] Descriptor sense data with sense descriptors (in hex):
[24073.799404] 72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00
[24073.799409] 25 42 c5 ac
[24073.799412] sd 0:0:0:0: [sda] Add. Sense: Unrecovered read error - auto reallocate failed
[24073.799415] end_request: I/O error, dev sda, sector 625132972
[24073.799434] ata1: EH complete
|
Right now there is none (after fsck), but just a while ago it was the second time I got it in two or three days.
Can this all be some funny misunderstanding, or is it time to stop denying that my hdd is in its last days on this Earth? I've just read in this forum that it's very common that even new hdds fail, that's what is most likely to fail on a new pc... I'm a bit boggled, somewhat skeptical, however, because before I was using almost ancient hdds (40 and 80 GB sized) that never gave this sort of error (I'm still using them in an older pc).
I was looking through smartmontools pages, and there was some instructions to "manually" fix some stuff. Anyone knows if this is just the same thing that fsck would do automatically anyway, or something "better", that may turn to show that the hdd isn't doomed after all, that the most scary messages are false-positives? (Oh, the blind hope).
The hdd is a Samsung HD502HI, the failing partition is ext3.
And about partitioning, can it have any effect on this sort of thing?
The way I've done is the following:
- 1 - sda1 - primary ntfs partition
- 2 - sda2 - extended partition, containing:
- 2a - sda5 (don't ask me about 3 and 4) - linux-swap
- 2b - sda6 - ext3
- 2c - sda7 - ext3
- 2d - sda8 - ext3
The first two ext3 are for filesystem roots, to ease distro/release hopping, the remaining one is /home.
I don't understand much about primary/extended and what it means, I just know that it wasn't possible to have them all as primary, according to the partitioner I've used (from linux), as I've been warned when I first tried it. Is this partitioning scheme something that is not recommendable, health-wise?
Finally, if this is all really bad, and I have to replace the hdd as I'm afraid I'll have to, can I still back up it all as it is -- including the installation of the operational systems -- with something like dd in that situation, or this is not advisable?
Thanks a lot.