LinuxQuestions.org - EXT-4 drive suddenly stopped working

- Ubuntu (https://www.linuxquestions.org/questions/ubuntu-63/)

- - EXT-4 drive suddenly stopped working (https://www.linuxquestions.org/questions/ubuntu-63/ext-4-drive-suddenly-stopped-working-820442/)

EXT-4 drive suddenly stopped working

I was watching a video file from a EXT-4 formatted drive I had connected via eSATA when part way through VLC stopped playing. I closed VLC and reopened the file only to get an error in the Dolphin status bar that read:

Quote:

An error occurred while accessing 'Videos', the system responded: [close button]

I also get this when attempting to mount the drive.
/var/log/syslog is full of the following errors:

Quote:

2010-07-17 23:14:48 nettop kernel [ 6887.312084] sd 3:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
2010-07-17 23:14:48 nettop kernel [ 6887.312092] sd 3:0:0:0: [sdb] CDB: Read(10): 28 00 00 00 00 3a 00 00 08 00
2010-07-17 23:14:48 nettop kernel [ 6887.312113] end_request: I/O error, dev sdb, sector 58
2010-07-17 23:14:48 nettop kernel [ 6887.312201] sd 3:0:0:0: [sdb] Unhandled error code

Attempting an fdisk on the drive gives:

Quote:

mark@nettop:~$ sudo fsck -C /dev/sdb
[sudo] password for mark:
fsck from util-linux-ng 2.17.2
e2fsck 1.41.11 (14-Mar-2010)
fsck.ext2: Attempt to read block from filesystem resulted in short read while trying to open /dev/sdb
Could this be a zero-length partition?
mark@nettop:~$

Attempting to run fdisk on the partition (/dev/sdb1) instead gives the same output.

What happened here, has my relatively new 1TB drive suddenly died on me?

Have you tried downloading the hard drive manufactures diagnostic software? If something is wrong with the drive it would find it.

If it finds nothing you can be pretty certain nothing is physically wrong with the drive.

I was able to mount it as normal when I tried later (the computer had been powered off and on again since). However, part way through copying data to it I got errors that it couldn't be written. I kicked off a badblocks on it and came back to find it endlessly spamming out 'Invalid argument during seek' errors.
I ran the latest Samsung diagnostic from UBCD 4.1.1 but while it detected the drive it said that it wasn't supported. I tried the next oldest one and it couldn't detect it.
After that I ran Parted Magic from UBCD 5.0.2 performing both basic and extended SMART tests from there didn't find anything but did give some interesting information about the last 5 problems, though it doesn't mean much to me. I've attached the output to the post for anyone who's having the same problem or who's interested in taking a look.

The issue is consistent and reproducible and I'm willing to throw a fair bit of time and effort at this problem to solve it because I really don't want all that data to go. Any tests anyone can suggest would be welcome.

Drive looks dead to me. Infant mortality; only 28 hours. Either that, or the time counter has just rolled over, but it claims to have been running for only 28 hours.

I have exactly one suggestion for you, and that is to try Spinrite on the drive. The utility costs $89, but if the drive is at all recoverable, Spinrite will recover it.

If you have a hardware fault in the silicon, spinrite won't help you. If, however, you have some sort of media problem, spinrite might just save you. The errors you are reporting are ambiguous; I am not sure if it is hardware or media (could be either).

Here's the output from the first 10% of a surface scan:

Quote:

Originally Posted by ESTOOL v3.00g

READ SURFACE SCAN

ERROR : LBA 2791870
ERROR : LBA 46762698
ERROR : LBA 64044252
ERROR : LBA 67996122
ERROR : LBA 79449799
ERROR : LBA 81754154
ERROR : LBA 78613688
ERROR : LBA 94748224
ERROR : LBA 100167736
ERROR : LBA 111818124
ERROR : LBA 152019222
ERROR : LBA 156514000
ERROR : LBA 161999830
ERROR : LBA 166965088
ERROR : LBA 177128688

Service code => SJ25 : Test OK

Firmware revision 1AJ100E4
Native size 953869 MB (LBA : 1953525168)

I also ran ViVard and noted down some numbers that may be relevant that don't look quite right to me. Can anyone tell me if these are normal?

Code:

                          Value        Worst        Tresh        Raw

Raw read error rate          100        100        51        0

Reallocated sector count  252        252        10        0

Seek error rate                  252        252        51        0

Reallocation events count 252        252        0        0

Uncorrectable sector count252        252        0        0

UltraDMA CRC error rate          98        98        0        1388

Write error rate          100        100        0        0

Is this a problem with the disk having too many bad sectors and running out of sectors to swap out? What would cause this?

It sounds like your drive is bunk, That sucks but at least its still under warranty.

I hope it's still under warranty or I'll never buy another Samsung drive again. Just run fsck.ext4 on it and got a quite different result to when I ran fsck initially.

Quote:

mark@nettop:~$ sudo fsck.ext4 /dev/sdb
[sudo] password for mark:
e2fsck 1.41.11 (14-Mar-2010)
fsck.ext4: Superblock invalid, trying backup blocks...
fsck.ext4: Bad magic number in super-block while trying to open /dev/sdb

The superblock could not be read or does not describe a correct ext2
filesystem. If the device is valid and it really contains an ext2
filesystem (and not swap or ufs or something else), then the superblock
is corrupt, and you might try running e2fsck with an alternate superblock:
e2fsck -b 8193 <device>

mark@nettop:~$

Are my problems likely down to something with EXT4 or the drive itself? Is there any chance at all that the drive itself isn't actually dead before I go out and buy a replacement?