LinuxQuestions.org - Could this be a zero-length partition? For 2 hard disks within a week

Hello everyone,

Coming from Fedora, I had a Web/Mail/SQL server with 2 hdd's. One (sda1) was the one with Fedora on, and all backups were on sdb1.
Now, I started having issues with SDA1, so I started copying files to SDB1. 2 weeks later I noticed similar issues, and started copying new back files to SDB1. But once I started doing this I lost connection with the server.
Turns out that (in rescue mode) that I couldn't access SDA anymore, it couldn't even read stored data on it, etc. It didn't show up in fdisk anymore either.

So after a lot of trying I asked to replace the disk and started installing Ubuntu 10.04 LTS on it. After having installed everything, I tried to restore one of the MySQL backups that was on the SDB disk, but halfway I started getting errors. I stopped the restore and did some other stuff that I wanted to take care of first (apache2 config etc.).
I then tried to reboot to be sure that that was working as well, and then the problems started. I couldn't access the server anymore, and in rescue mode (which is a live cd my host provides, it's a dedicated server btw.) I didn't see SDB anymore.
I told them this and now after they told me that they managed to get it back, I went into the rescue mode and tried to recover bad blocks with the e2fsck command.

Code:

8 root@rescuecd64 / # e2fsck -y -f -v /dev/sdb1

e2fsck 1.41.3 (12-Oct-2008)

e2fsck: Attempt to read block from filesystem resulted in short read while trying to open /dev/sdb1

Could this be a zero-length partition?

Code:

root@rescuecd64 / # mke2fs -n /dev/sdb1

mke2fs 1.41.3 (12-Oct-2008)

Filesystem label=

OS type: Linux

Block size=4096 (log=2)

Fragment size=4096 (log=2)

30531584 inodes, 122096000 blocks

6104800 blocks (5.00%) reserved for the super user

First data block=0

Maximum filesystem blocks=4294967296

3727 block groups

32768 blocks per group, 32768 fragments per group

8192 inodes per group

Superblock backups stored on blocks:

        32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,

        4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968, 102400000

Trying to restore this bad superblock doesn't work. I've tried for almost all of them:

Code:

1 root@rescuecd64 / # e2fsck -b 32768 -y -f -v /dev/sdb1

e2fsck 1.41.3 (12-Oct-2008)

e2fsck: Attempt to read block from filesystem resulted in short read while trying to open /dev/sdb1

Could this be a zero-length partition?

8 root@rescuecd64 / # e2fsck -b 98304 -y -f -v /dev/sdb1

e2fsck 1.41.3 (12-Oct-2008)

e2fsck: Attempt to read block from filesystem resulted in short read while trying to open /dev/sdb1

Could this be a zero-length partition?

8 root@rescuecd64 / #

8 root@rescuecd64 / # dumpe2fs -f /dev/sdb1 | grep -i superblock

dumpe2fs 1.41.3 (12-Oct-2008)

dumpe2fs: Attempt to read block from filesystem resulted in short read while trying to open /dev/sdb1

Couldn't find valid filesystem superblock.

root@rescuecd64 / # e2fsck -f -b 32768 /dev/sdb1

e2fsck 1.41.3 (12-Oct-2008)

e2fsck: Attempt to read block from filesystem resulted in short read while trying to open /dev/sdb1

Could this be a zero-length partition?

8 root@rescuecd64 / # e2fsck -b 163840 -y -f -v /dev/sdb1

e2fsck 1.41.3 (12-Oct-2008)

e2fsck: Attempt to read block from filesystem resulted in short read while trying to open /dev/sdb1

Could this be a zero-length partition?

8 root@rescuecd64 / # e2fsck -b 229376 -y -f -v /dev/sdb1

e2fsck 1.41.3 (12-Oct-2008)

e2fsck: Attempt to read block from filesystem resulted in short read while trying to open /dev/sdb1

Could this be a zero-length partition?

8 root@rescuecd64 / # e2fsck -b 294912 -y -f -v /dev/sdb1

e2fsck 1.41.3 (12-Oct-2008)

e2fsck: Attempt to read block from filesystem resulted in short read while trying to open /dev/sdb1

Could this be a zero-length partition?

8 root@rescuecd64 / # e2fsck -b 819200 -y -f -v /dev/sdb1

e2fsck 1.41.3 (12-Oct-2008)

e2fsck: Attempt to read block from filesystem resulted in short read while trying to open /dev/sdb1

Could this be a zero-length partition?

8 root@rescuecd64 / # e2fsck -b 884736 -y -f -v /dev/sdb1

e2fsck 1.41.3 (12-Oct-2008)

e2fsck: Attempt to read block from filesystem resulted in short read while trying to open /dev/sdb1

Could this be a zero-length partition?

8 root@rescuecd64 / # e2fsck -b 1605632 -y -f -v /dev/sdb1

e2fsck 1.41.3 (12-Oct-2008)

e2fsck: Attempt to read block from filesystem resulted in short read while trying to open /dev/sdb1

Could this be a zero-length partition?

8 root@rescuecd64 / # e2fsck -b 2654208 -y -f -v /dev/sdb1

e2fsck 1.41.3 (12-Oct-2008)

e2fsck: Attempt to read block from filesystem resulted in short read while trying to open /dev/sdb1

Could this be a zero-length partition?

What are the chances of TWO hard disks failing? Surely there must be something else wrong? Did a bad driver (which would be weird, since I didn't do any updates like that) screw both of my hard disks up in Fedora?

Also, I must note that when I tried to copy the backups from SDB1 to SDA1 (the new one). I got a read only (??) error, and some other I/O errors.

Just for information, I'll show fstab and mtab as well:
/etc/fstab

Code:

root@rescuecd64 /mnt # cat /mnt/sda/etc/fstab



/dev/sda1              /      ext3    noatime        0 1

/dev/sda2              none    swap    sw              0 0

#/dev/sdb1              /backup ext3    defaults        0 2

I commented sdb1 now out, just to see whether it does something at the restart. But I have no idea how this could be the problem.

/etc/mtab

Code:

1 root@rescuecd64 /mnt # cat /mnt/sda/etc/mtab

/dev/sda1 / ext3 rw,noatime 0 0

none /proc proc rw,noexec,nosuid,nodev 0 0

none /sys sysfs rw,noexec,nosuid,nodev 0 0

none /sys/fs/fuse/connections fusectl rw 0 0

none /sys/kernel/debug debugfs rw 0 0

none /sys/kernel/security securityfs rw 0 0

none /dev devtmpfs rw,mode=0755 0 0

none /dev/pts devpts rw,noexec,nosuid,gid=5,mode=0620 0 0

none /dev/shm tmpfs rw,nosuid,nodev 0 0

none /var/run tmpfs rw,nosuid,mode=0755 0 0

none /var/lock tmpfs rw,noexec,nosuid,nodev 0 0

none /lib/init/rw tmpfs rw,nosuid,mode=0755 0 0

none /var/lib/ureadahead/debugfs debugfs rw,relatime 0 0

I haven't added the sdb here yet, but it was here in Fedora so that can't be the problem.

Edit: Oh and this is /var/log/messages when the problems emerged

Code:

Jul 14 11:59:07 pitwall kernel: [55258.256241] ip_tables: (C) 2000-2006 Netfilter Core Team

Jul 14 16:29:00 pitwall kernel: [71451.038942] ata2: hard resetting link

Jul 14 16:29:01 pitwall kernel: [71451.560117] ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)

Jul 14 16:29:06 pitwall kernel: [71456.562598] ata2: hard resetting link

Jul 14 16:29:06 pitwall kernel: [71457.090107] ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)

Jul 14 16:29:06 pitwall kernel: [71457.127577] ata2.00: disabled

Jul 14 16:29:06 pitwall kernel: [71457.127601] ata2: EH complete

Jul 14 16:29:06 pitwall kernel: [71457.127637] sd 1:0:0:0: [sdb] Unhandled error code

Jul 14 16:29:06 pitwall kernel: [71457.127640] sd 1:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK

Jul 14 16:29:06 pitwall kernel: [71457.127644] sd 1:0:0:0: [sdb] CDB: Read(10): 28 00 03 26 58 ff 00 01 00 00

Jul 14 16:29:06 pitwall kernel: [71457.132863] sd 1:0:0:0: [sdb] Unhandled error code

Jul 14 16:29:06 pitwall kernel: [71457.132865] sd 1:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK

Jul 14 16:29:06 pitwall kernel: [71457.132868] sd 1:0:0:0: [sdb] CDB: Read(10): 28 00 03 26 59 ff 00 01 00 00

Jul 14 16:29:06 pitwall kernel: [71457.137626] sd 1:0:0:0: [sdb] Unhandled error code

Jul 14 16:29:06 pitwall kernel: [71457.137626] sd 1:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK

Jul 14 16:29:06 pitwall kernel: [71457.137626] sd 1:0:0:0: [sdb] CDB: Read(10): 28 00 03 26 58 ff 00 00 08 00

Jul 14 16:29:06 pitwall kernel: [71457.144161] sd 1:0:0:0: [sdb] Unhandled error code

Jul 14 16:29:06 pitwall kernel: [71457.144163] sd 1:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK

Jul 14 16:29:06 pitwall kernel: [71457.144166] sd 1:0:0:0: [sdb] CDB: Read(10): 28 00 03 26 58 ff 00 00 08 00

Jul 14 16:29:06 pitwall kernel: [71457.150233] sd 1:0:0:0: [sdb] Unhandled error code

Jul 14 16:29:06 pitwall kernel: [71457.150235] sd 1:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK

Jul 14 16:29:06 pitwall kernel: [71457.150238] sd 1:0:0:0: [sdb] CDB: Read(10): 28 00 03 26 58 ff 00 00 08 00

Jul 14 16:29:06 pitwall kernel: [71457.156420] sd 1:0:0:0: [sdb] Unhandled error code

Jul 14 16:29:06 pitwall kernel: [71457.156422] sd 1:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK

Jul 14 16:29:06 pitwall kernel: [71457.156425] sd 1:0:0:0: [sdb] CDB: Read(10): 28 00 03 26 58 ff 00 00 08 00

Jul 14 16:29:06 pitwall kernel: [71457.162572] sd 1:0:0:0: [sdb] Unhandled error code

Jul 14 16:29:06 pitwall kernel: [71457.162575] sd 1:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK

Jul 14 16:29:06 pitwall kernel: [71457.162578] sd 1:0:0:0: [sdb] CDB: Read(10): 28 00 03 26 58 ff 00 00 08 00

Jul 14 16:29:06 pitwall kernel: [71457.168719] sd 1:0:0:0: [sdb] Unhandled error code

Jul 14 16:29:06 pitwall kernel: [71457.168721] sd 1:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK

Jul 14 16:29:06 pitwall kernel: [71457.168724] sd 1:0:0:0: [sdb] CDB: Read(10): 28 00 03 26 58 ff 00 00 08 00

Jul 14 16:29:06 pitwall kernel: [71457.175008] sd 1:0:0:0: [sdb] Unhandled error code

Jul 14 16:29:06 pitwall kernel: [71457.175010] sd 1:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK

Jul 14 16:29:06 pitwall kernel: [71457.175013] sd 1:0:0:0: [sdb] CDB: Read(10): 28 00 03 26 58 ff 00 00 08 00

Jul 14 16:29:06 pitwall kernel: [71457.181238] sd 1:0:0:0: [sdb] Unhandled error code

Jul 14 16:29:06 pitwall kernel: [71457.181240] sd 1:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK

Jul 14 16:29:06 pitwall kernel: [71457.181244] sd 1:0:0:0: [sdb] CDB: Read(10): 28 00 03 26 58 ff 00 00 08 00

Jul 14 16:29:06 pitwall kernel: [71457.187377] sd 1:0:0:0: [sdb] Unhandled error code

Jul 14 16:29:06 pitwall kernel: [71457.187379] sd 1:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK

Jul 14 16:29:06 pitwall kernel: [71457.187382] sd 1:0:0:0: [sdb] CDB: Read(10): 28 00 03 26 58 ff 00 00 08 00

Jul 14 16:29:06 pitwall kernel: [71457.193354] sd 1:0:0:0: [sdb] Unhandled error code

Jul 14 16:29:06 pitwall kernel: [71457.193356] sd 1:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK

Jul 14 16:29:06 pitwall kernel: [71457.193359] sd 1:0:0:0: [sdb] CDB: Read(10): 28 00 03 26 58 ff 00 00 08 00

...

(the same lines over and over)

...

Jul 14 16:29:07 pitwall kernel: [71457.345981] sd 1:0:0:0: [sdb] Unhandled error code

Jul 14 16:29:07 pitwall kernel: [71457.345983] sd 1:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK

Jul 14 16:29:07 pitwall kernel: [71457.345986] sd 1:0:0:0: [sdb] CDB: Read(10): 28 00 03 26 58 ff 00 00 08 00

Jul 14 16:29:07 pitwall kernel: [71457.348106] sd 1:0:0:0: [sdb] Unhandled error code

Jul 14 16:29:07 pitwall kernel: [71457.348108] sd 1:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK

Jul 14 16:29:07 pitwall kernel: [71457.348111] sd 1:0:0:0: [sdb] CDB: Read(10): 28 00 03 26 58 ff 00 00 08 00

Jul 14 16:29:07 pitwall kernel: [71457.350251] sd 1:0:0:0: [sdb] Unhandled error code

Jul 14 16:29:07 pitwall kernel: [71457.350253] sd 1:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK

Jul 14 16:29:07 pitwall kernel: [71457.350256] sd 1:0:0:0: [sdb] CDB: Read(10): 28 00 03 26 58 ff 00 00 08 00

Jul 14 16:29:08 pitwall kernel: 57.578483] end_request: I/O error, dev sdb, sector 52844799

...

And then it restarts with the same over and over, here it ends:

Jul 14 16:31:09 pitwall kernel: [71577.980577] sd 1:0:0:0: [sdb] Unhandled error code

Jul 14 16:31:09 pitwall kernel: [71577.980578] sd 1:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK

Jul 14 16:31:09 pitwall kernel: [71577.980580] sd 1:0:0:0: [sdb] CDB: Read(10): 28 00 03 26 58 ff 00 00 08 00

Jul 14 16:31:09 pitwall kernel: [71577.981090] sd 1:0:0:0: [sdb] Unhandled error code

Jul 14 16:31:09 pitwall kernel: [71577.981091] sd 1:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK

Jul 14 16:31:09 pitwall kernel: [71577.981093] sd 1:0:0:0: [sdb] CDB: Read(10): 28 00 03 26 58 ff 00 00 08 00

Jul 14 19:06:09 pitwall kernel: [80879.980158] sd 1:0:0:0: [sdb] Unhandled error code

Jul 14 19:06:09 pitwall kernel: [80879.980161] sd 1:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK

Jul 14 19:06:09 pitwall kernel: [80879.980165] sd 1:0:0:0: [sdb] CDB: Write(10): 2a 00 00 00 30 bf 00 00 10 00

Jul 14 19:06:09 pitwall kernel: [80879.983113] sd 1:0:0:0: [sdb] Unhandled error code

Jul 14 19:06:09 pitwall kernel: [80879.983115] sd 1:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK

Jul 14 19:06:09 pitwall kernel: [80879.983118] sd 1:0:0:0: [sdb] CDB: Write(10): 2a 00 00 00 30 87 00 00 08 00

Jul 14 19:06:09 pitwall kernel: [80879.986837] lost page write due to I/O error on sdb1

Jul 14 19:50:55 pitwall kernel: Kernel logging (proc) stopped.

Then on the ubuntu forum they proposed me to do this:
I ran smartctl and it gave me this:

Code:

root@rescuecd64 / # smartctl -l selftest /dev/sdb

smartctl version 5.38 [x86_64-unknown-linux-gnu] Copyright (C) 2002-8 Bruce Allen

Home page is http://smartmontools.sourceforge.net/



Short INQUIRY response, skip product id

A mandatory SMART command failed: exiting. To continue, add one or more '-T permissive' options.



2 root@rescuecd64 / # smartctl -l selftest -T permissive /dev/sdb

smartctl version 5.38 [x86_64-unknown-linux-gnu] Copyright (C) 2002-8 Bruce Allen

Home page is http://smartmontools.sourceforge.net/



Short INQUIRY response, skip product id

Device does not support Self Test logging

Code:

6 root@rescuecd64 / # dd if=/dev/sdb1 of=/dev/null count=8

dd: reading `/dev/sdb1': Input/output error

0+0 records in

0+0 records out

0 bytes (0 B) copied, 0,000305067 s, 0,0 kB/s

If anyone can help, I'd be very thankful. Otherwise I'll just have lost a project I started in 2004 :(