LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Hardware (https://www.linuxquestions.org/questions/linux-hardware-18/)
-   -   Could this be a zero-length partition? For 2 hard disks within a week (https://www.linuxquestions.org/questions/linux-hardware-18/could-this-be-a-zero-length-partition-for-2-hard-disks-within-a-week-891845/)

TonyDeWittePony 07-15-2011 04:12 AM

Could this be a zero-length partition? For 2 hard disks within a week
 
Hello everyone,

Coming from Fedora, I had a Web/Mail/SQL server with 2 hdd's. One (sda1) was the one with Fedora on, and all backups were on sdb1.
Now, I started having issues with SDA1, so I started copying files to SDB1. 2 weeks later I noticed similar issues, and started copying new back files to SDB1. But once I started doing this I lost connection with the server.
Turns out that (in rescue mode) that I couldn't access SDA anymore, it couldn't even read stored data on it, etc. It didn't show up in fdisk anymore either.

So after a lot of trying I asked to replace the disk and started installing Ubuntu 10.04 LTS on it. After having installed everything, I tried to restore one of the MySQL backups that was on the SDB disk, but halfway I started getting errors. I stopped the restore and did some other stuff that I wanted to take care of first (apache2 config etc.).
I then tried to reboot to be sure that that was working as well, and then the problems started. I couldn't access the server anymore, and in rescue mode (which is a live cd my host provides, it's a dedicated server btw.) I didn't see SDB anymore.
I told them this and now after they told me that they managed to get it back, I went into the rescue mode and tried to recover bad blocks with the e2fsck command.

Code:

8 root@rescuecd64 / # e2fsck -y -f -v /dev/sdb1
e2fsck 1.41.3 (12-Oct-2008)
e2fsck: Attempt to read block from filesystem resulted in short read while trying to open /dev/sdb1
Could this be a zero-length partition?

Code:

root@rescuecd64 / # mke2fs -n /dev/sdb1
mke2fs 1.41.3 (12-Oct-2008)
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
30531584 inodes, 122096000 blocks
6104800 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=4294967296
3727 block groups
32768 blocks per group, 32768 fragments per group
8192 inodes per group
Superblock backups stored on blocks:
        32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
        4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968, 102400000

Trying to restore this bad superblock doesn't work. I've tried for almost all of them:
Code:

1 root@rescuecd64 / # e2fsck -b 32768 -y -f -v /dev/sdb1
e2fsck 1.41.3 (12-Oct-2008)
e2fsck: Attempt to read block from filesystem resulted in short read while trying to open /dev/sdb1
Could this be a zero-length partition?
8 root@rescuecd64 / # e2fsck -b 98304 -y -f -v /dev/sdb1
e2fsck 1.41.3 (12-Oct-2008)
e2fsck: Attempt to read block from filesystem resulted in short read while trying to open /dev/sdb1
Could this be a zero-length partition?
8 root@rescuecd64 / #
8 root@rescuecd64 / # dumpe2fs -f /dev/sdb1 | grep -i superblock
dumpe2fs 1.41.3 (12-Oct-2008)
dumpe2fs: Attempt to read block from filesystem resulted in short read while trying to open /dev/sdb1
Couldn't find valid filesystem superblock.
root@rescuecd64 / # e2fsck -f -b 32768 /dev/sdb1
e2fsck 1.41.3 (12-Oct-2008)
e2fsck: Attempt to read block from filesystem resulted in short read while trying to open /dev/sdb1
Could this be a zero-length partition?
8 root@rescuecd64 / # e2fsck -b 163840 -y -f -v /dev/sdb1
e2fsck 1.41.3 (12-Oct-2008)
e2fsck: Attempt to read block from filesystem resulted in short read while trying to open /dev/sdb1
Could this be a zero-length partition?
8 root@rescuecd64 / # e2fsck -b 229376 -y -f -v /dev/sdb1
e2fsck 1.41.3 (12-Oct-2008)
e2fsck: Attempt to read block from filesystem resulted in short read while trying to open /dev/sdb1
Could this be a zero-length partition?
8 root@rescuecd64 / # e2fsck -b 294912 -y -f -v /dev/sdb1
e2fsck 1.41.3 (12-Oct-2008)
e2fsck: Attempt to read block from filesystem resulted in short read while trying to open /dev/sdb1
Could this be a zero-length partition?
8 root@rescuecd64 / # e2fsck -b 819200 -y -f -v /dev/sdb1
e2fsck 1.41.3 (12-Oct-2008)
e2fsck: Attempt to read block from filesystem resulted in short read while trying to open /dev/sdb1
Could this be a zero-length partition?
8 root@rescuecd64 / # e2fsck -b 884736 -y -f -v /dev/sdb1
e2fsck 1.41.3 (12-Oct-2008)
e2fsck: Attempt to read block from filesystem resulted in short read while trying to open /dev/sdb1
Could this be a zero-length partition?
8 root@rescuecd64 / # e2fsck -b 1605632 -y -f -v /dev/sdb1
e2fsck 1.41.3 (12-Oct-2008)
e2fsck: Attempt to read block from filesystem resulted in short read while trying to open /dev/sdb1
Could this be a zero-length partition?
8 root@rescuecd64 / # e2fsck -b 2654208 -y -f -v /dev/sdb1
e2fsck 1.41.3 (12-Oct-2008)
e2fsck: Attempt to read block from filesystem resulted in short read while trying to open /dev/sdb1
Could this be a zero-length partition?

What are the chances of TWO hard disks failing? Surely there must be something else wrong? Did a bad driver (which would be weird, since I didn't do any updates like that) screw both of my hard disks up in Fedora?

Also, I must note that when I tried to copy the backups from SDB1 to SDA1 (the new one). I got a read only (??) error, and some other I/O errors.

Just for information, I'll show fstab and mtab as well:
/etc/fstab
Code:

root@rescuecd64 /mnt # cat /mnt/sda/etc/fstab

/dev/sda1              /      ext3    noatime        0 1
/dev/sda2              none    swap    sw              0 0
#/dev/sdb1              /backup ext3    defaults        0 2

I commented sdb1 now out, just to see whether it does something at the restart. But I have no idea how this could be the problem.

/etc/mtab
Code:

1 root@rescuecd64 /mnt # cat /mnt/sda/etc/mtab
/dev/sda1 / ext3 rw,noatime 0 0
none /proc proc rw,noexec,nosuid,nodev 0 0
none /sys sysfs rw,noexec,nosuid,nodev 0 0
none /sys/fs/fuse/connections fusectl rw 0 0
none /sys/kernel/debug debugfs rw 0 0
none /sys/kernel/security securityfs rw 0 0
none /dev devtmpfs rw,mode=0755 0 0
none /dev/pts devpts rw,noexec,nosuid,gid=5,mode=0620 0 0
none /dev/shm tmpfs rw,nosuid,nodev 0 0
none /var/run tmpfs rw,nosuid,mode=0755 0 0
none /var/lock tmpfs rw,noexec,nosuid,nodev 0 0
none /lib/init/rw tmpfs rw,nosuid,mode=0755 0 0
none /var/lib/ureadahead/debugfs debugfs rw,relatime 0 0

I haven't added the sdb here yet, but it was here in Fedora so that can't be the problem.

Edit: Oh and this is /var/log/messages when the problems emerged
Code:

Jul 14 11:59:07 pitwall kernel: [55258.256241] ip_tables: (C) 2000-2006 Netfilter Core Team
Jul 14 16:29:00 pitwall kernel: [71451.038942] ata2: hard resetting link
Jul 14 16:29:01 pitwall kernel: [71451.560117] ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Jul 14 16:29:06 pitwall kernel: [71456.562598] ata2: hard resetting link
Jul 14 16:29:06 pitwall kernel: [71457.090107] ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Jul 14 16:29:06 pitwall kernel: [71457.127577] ata2.00: disabled
Jul 14 16:29:06 pitwall kernel: [71457.127601] ata2: EH complete
Jul 14 16:29:06 pitwall kernel: [71457.127637] sd 1:0:0:0: [sdb] Unhandled error code
Jul 14 16:29:06 pitwall kernel: [71457.127640] sd 1:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Jul 14 16:29:06 pitwall kernel: [71457.127644] sd 1:0:0:0: [sdb] CDB: Read(10): 28 00 03 26 58 ff 00 01 00 00
Jul 14 16:29:06 pitwall kernel: [71457.132863] sd 1:0:0:0: [sdb] Unhandled error code
Jul 14 16:29:06 pitwall kernel: [71457.132865] sd 1:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Jul 14 16:29:06 pitwall kernel: [71457.132868] sd 1:0:0:0: [sdb] CDB: Read(10): 28 00 03 26 59 ff 00 01 00 00
Jul 14 16:29:06 pitwall kernel: [71457.137626] sd 1:0:0:0: [sdb] Unhandled error code
Jul 14 16:29:06 pitwall kernel: [71457.137626] sd 1:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Jul 14 16:29:06 pitwall kernel: [71457.137626] sd 1:0:0:0: [sdb] CDB: Read(10): 28 00 03 26 58 ff 00 00 08 00
Jul 14 16:29:06 pitwall kernel: [71457.144161] sd 1:0:0:0: [sdb] Unhandled error code
Jul 14 16:29:06 pitwall kernel: [71457.144163] sd 1:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Jul 14 16:29:06 pitwall kernel: [71457.144166] sd 1:0:0:0: [sdb] CDB: Read(10): 28 00 03 26 58 ff 00 00 08 00
Jul 14 16:29:06 pitwall kernel: [71457.150233] sd 1:0:0:0: [sdb] Unhandled error code
Jul 14 16:29:06 pitwall kernel: [71457.150235] sd 1:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Jul 14 16:29:06 pitwall kernel: [71457.150238] sd 1:0:0:0: [sdb] CDB: Read(10): 28 00 03 26 58 ff 00 00 08 00
Jul 14 16:29:06 pitwall kernel: [71457.156420] sd 1:0:0:0: [sdb] Unhandled error code
Jul 14 16:29:06 pitwall kernel: [71457.156422] sd 1:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Jul 14 16:29:06 pitwall kernel: [71457.156425] sd 1:0:0:0: [sdb] CDB: Read(10): 28 00 03 26 58 ff 00 00 08 00
Jul 14 16:29:06 pitwall kernel: [71457.162572] sd 1:0:0:0: [sdb] Unhandled error code
Jul 14 16:29:06 pitwall kernel: [71457.162575] sd 1:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Jul 14 16:29:06 pitwall kernel: [71457.162578] sd 1:0:0:0: [sdb] CDB: Read(10): 28 00 03 26 58 ff 00 00 08 00
Jul 14 16:29:06 pitwall kernel: [71457.168719] sd 1:0:0:0: [sdb] Unhandled error code
Jul 14 16:29:06 pitwall kernel: [71457.168721] sd 1:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Jul 14 16:29:06 pitwall kernel: [71457.168724] sd 1:0:0:0: [sdb] CDB: Read(10): 28 00 03 26 58 ff 00 00 08 00
Jul 14 16:29:06 pitwall kernel: [71457.175008] sd 1:0:0:0: [sdb] Unhandled error code
Jul 14 16:29:06 pitwall kernel: [71457.175010] sd 1:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Jul 14 16:29:06 pitwall kernel: [71457.175013] sd 1:0:0:0: [sdb] CDB: Read(10): 28 00 03 26 58 ff 00 00 08 00
Jul 14 16:29:06 pitwall kernel: [71457.181238] sd 1:0:0:0: [sdb] Unhandled error code
Jul 14 16:29:06 pitwall kernel: [71457.181240] sd 1:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Jul 14 16:29:06 pitwall kernel: [71457.181244] sd 1:0:0:0: [sdb] CDB: Read(10): 28 00 03 26 58 ff 00 00 08 00
Jul 14 16:29:06 pitwall kernel: [71457.187377] sd 1:0:0:0: [sdb] Unhandled error code
Jul 14 16:29:06 pitwall kernel: [71457.187379] sd 1:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Jul 14 16:29:06 pitwall kernel: [71457.187382] sd 1:0:0:0: [sdb] CDB: Read(10): 28 00 03 26 58 ff 00 00 08 00
Jul 14 16:29:06 pitwall kernel: [71457.193354] sd 1:0:0:0: [sdb] Unhandled error code
Jul 14 16:29:06 pitwall kernel: [71457.193356] sd 1:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Jul 14 16:29:06 pitwall kernel: [71457.193359] sd 1:0:0:0: [sdb] CDB: Read(10): 28 00 03 26 58 ff 00 00 08 00
...
(the same lines over and over)
...
Jul 14 16:29:07 pitwall kernel: [71457.345981] sd 1:0:0:0: [sdb] Unhandled error code
Jul 14 16:29:07 pitwall kernel: [71457.345983] sd 1:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Jul 14 16:29:07 pitwall kernel: [71457.345986] sd 1:0:0:0: [sdb] CDB: Read(10): 28 00 03 26 58 ff 00 00 08 00
Jul 14 16:29:07 pitwall kernel: [71457.348106] sd 1:0:0:0: [sdb] Unhandled error code
Jul 14 16:29:07 pitwall kernel: [71457.348108] sd 1:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Jul 14 16:29:07 pitwall kernel: [71457.348111] sd 1:0:0:0: [sdb] CDB: Read(10): 28 00 03 26 58 ff 00 00 08 00
Jul 14 16:29:07 pitwall kernel: [71457.350251] sd 1:0:0:0: [sdb] Unhandled error code
Jul 14 16:29:07 pitwall kernel: [71457.350253] sd 1:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Jul 14 16:29:07 pitwall kernel: [71457.350256] sd 1:0:0:0: [sdb] CDB: Read(10): 28 00 03 26 58 ff 00 00 08 00
Jul 14 16:29:08 pitwall kernel: 57.578483] end_request: I/O error, dev sdb, sector 52844799
...
And then it restarts with the same over and over, here it ends:
Jul 14 16:31:09 pitwall kernel: [71577.980577] sd 1:0:0:0: [sdb] Unhandled error code
Jul 14 16:31:09 pitwall kernel: [71577.980578] sd 1:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Jul 14 16:31:09 pitwall kernel: [71577.980580] sd 1:0:0:0: [sdb] CDB: Read(10): 28 00 03 26 58 ff 00 00 08 00
Jul 14 16:31:09 pitwall kernel: [71577.981090] sd 1:0:0:0: [sdb] Unhandled error code
Jul 14 16:31:09 pitwall kernel: [71577.981091] sd 1:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Jul 14 16:31:09 pitwall kernel: [71577.981093] sd 1:0:0:0: [sdb] CDB: Read(10): 28 00 03 26 58 ff 00 00 08 00
Jul 14 19:06:09 pitwall kernel: [80879.980158] sd 1:0:0:0: [sdb] Unhandled error code
Jul 14 19:06:09 pitwall kernel: [80879.980161] sd 1:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Jul 14 19:06:09 pitwall kernel: [80879.980165] sd 1:0:0:0: [sdb] CDB: Write(10): 2a 00 00 00 30 bf 00 00 10 00
Jul 14 19:06:09 pitwall kernel: [80879.983113] sd 1:0:0:0: [sdb] Unhandled error code
Jul 14 19:06:09 pitwall kernel: [80879.983115] sd 1:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Jul 14 19:06:09 pitwall kernel: [80879.983118] sd 1:0:0:0: [sdb] CDB: Write(10): 2a 00 00 00 30 87 00 00 08 00
Jul 14 19:06:09 pitwall kernel: [80879.986837] lost page write due to I/O error on sdb1
Jul 14 19:50:55 pitwall kernel: Kernel logging (proc) stopped.

Then on the ubuntu forum they proposed me to do this:
I ran smartctl and it gave me this:
Code:

root@rescuecd64 / # smartctl -l selftest /dev/sdb
smartctl version 5.38 [x86_64-unknown-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

Short INQUIRY response, skip product id
A mandatory SMART command failed: exiting. To continue, add one or more '-T permissive' options.

2 root@rescuecd64 / # smartctl -l selftest -T permissive /dev/sdb
smartctl version 5.38 [x86_64-unknown-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

Short INQUIRY response, skip product id
Device does not support Self Test logging

Code:

6 root@rescuecd64 / # dd if=/dev/sdb1 of=/dev/null count=8
dd: reading `/dev/sdb1': Input/output error
0+0 records in
0+0 records out
0 bytes (0 B) copied, 0,000305067 s, 0,0 kB/s

If anyone can help, I'd be very thankful. Otherwise I'll just have lost a project I started in 2004 :(

business_kid 07-15-2011 09:17 AM

This sounds very bad.
One big error seems to have been in /etc/fstab
Quote:

/dev/sda1 / ext3 noatime 0 1
When were these disks checked? have you any other backup?

I would get the two hard disks physically in your hand. You can stick them in a box, run testdisk for older partitions, and e2fsck if you feel brave. Stop working remotely. They have done something, If they made a filesystem over yours, you're basically finished. There is also photorec, but it finds bits and someone has to piece them together.

If there's clicking from one of the drives, stop at once. The card is probably blown. IF the disks are the same, check code numbers on the cards. I have successfully rescued data from a disk by swapping out the blown card. In any diagnostics, I would get the disk on a cable by itself if at all possible. There are professionals at this who try harder, and may get past a rewrite..


All times are GMT -5. The time now is 07:39 PM.