-   Fedora (
-   -   Assistance using fsck to recover ext3 LVM partition (

fiberfarm 11-01-2006 12:53 AM

Assistance using fsck to recover ext3 LVM partition
Hey all,

I am running kernel 2.6.17-1.2186_FC5smp.

I have four 250GB drives, hda, hdb, sda, sdb.

hdb has a boot, root, and swap partition, totalling about 20GB of space. The remainder of hdb is part of a LVM volume group, along with the other three drives.

The LVM group has been configured to mount at /dev/video. It is formatted for ext3.

This box is dedicated to running MythTV. I noticed a few days ago that MythTV wasn't running. My attempts to start the service failed. Upon further investigation I discoverd that I couldn't access /video, despite the rest of the computer working fine. It had probably been in this working-but-no-access-to-/video state for several days.

I rebooted.

Upon reboot, fsck fails on the /video partition. It says:

"fsck.ext3: Attempt to read block from filesystem resulted in short read while trying to open /dev/VolGroup/Video. Could this be a zero-length partition?"

I drop to single user mode, Repair filesystem prompt.

Here things get sticky:

pvscan says "Locking type 1 initialisation failed".

I suspect that perhaps LVM support is not enabled in this single user mode, so I can't run proper diagnostics on the drive.

Attempting to use a different superblock gets me nowhere (not surprising as I don't think this is working properly through LVM):

e2fsck -b 32 /dev/VolGroup/Video

"Bad magic number in super-block while trying to open /dev/VolGroup/Video"

I know not to run fsck on the raw partitions themselves, of course this will hose things.

How do I get these LVM volumes grouped together so I can run a proper check of the drive? I'm hoping I can try a different superblock and get things up and running again, but using fsck isn't exactly my specialty.

I have run SMART diagnostics on all the drives and they all pass the long tests.


fiberfarm 11-01-2006 08:49 PM

Okay, for those of you who may encounter a similar problem, here's how I resolved this:

I booted off the CD and went into rescue mode ("linux rescue" at the boot prompt).

I then made a backup of my fstab.

I then edited the fstab, and removed the reference to the LVM volume group (/video in my case).

I then rebooted normally.

Everything of course came back up just fine, but without the /video partition - but with LVM support.

I was then able to run fsck on the disk like so:

First I located the backup superblocks on my system like so:

mke2fs -n /dev/VolGroup/Video

The -n option makes it simply show what it WOULD do if you asked it to create a new file system, but doesn't actually create the file system itself.

From the list of backup superblocks this generated, I picked the lowest one (I have no idea if it makes a difference which one you choose, I guessed, it worked).

I then ran fsck using the -b option to specify the backup superblock to use:

fsck -b 32768 /dev/VolGroup/Video

This then went through and found all the problems in the drive and fixed them. Note that you might want to use the "yes" option (-y) to have it automatically say yes to everything - I'm paranoid and didn't, but it did mean hitting "y" about 4,983 times - good thing for fast key repeat rates. :-)

Once fsck was done fixing everything, I put back my backup copy of fstab which included the reference to my /video mount and rebooted.

Voila! All data back and happy.

At this point I backed everything up and am investigating the cause of this problem - dmesg has some scary stuff about hard drive errors in it so I suspect I have a hard drive on the verge of death. Nevermind that they are all less than 3 months old. Grr. In either case, I was successfully in recovering 100% of my needed data (note that I had a backup of these files, so I only had to get about 40GB out of the 900GB - just the new files that had changed since last backup. I suspect some data may have been lost in this process, but apparently, none that I needed).

Also, in the process of resolving this issue someone pointed out to me that I am mixing both SATA and PATA drives in the same LVM, plus multiple brands with minor size differences. LVM seems to support this, but I had several people tell me they suspect that is the cause of this failure. If anyone has anything else to add to that, let me know - I suspect a hardware failure, not a configuration problem is the cause here.

Next up: Replacing this unreliable LVM mess with a software RAID 5 solution. Gonna need some bigger drives.....

All times are GMT -5. The time now is 10:04 AM.