Assistance using fsck to recover ext3 LVM partition
I am running kernel 2.6.17-1.2186_FC5smp.
I have four 250GB drives, hda, hdb, sda, sdb.
hdb has a boot, root, and swap partition, totalling about 20GB of space. The remainder of hdb is part of a LVM volume group, along with the other three drives.
The LVM group has been configured to mount at /dev/video. It is formatted for ext3.
This box is dedicated to running MythTV. I noticed a few days ago that MythTV wasn't running. My attempts to start the service failed. Upon further investigation I discoverd that I couldn't access /video, despite the rest of the computer working fine. It had probably been in this working-but-no-access-to-/video state for several days.
Upon reboot, fsck fails on the /video partition. It says:
"fsck.ext3: Attempt to read block from filesystem resulted in short read while trying to open /dev/VolGroup/Video. Could this be a zero-length partition?"
I drop to single user mode, Repair filesystem prompt.
Here things get sticky:
pvscan says "Locking type 1 initialisation failed".
I suspect that perhaps LVM support is not enabled in this single user mode, so I can't run proper diagnostics on the drive.
Attempting to use a different superblock gets me nowhere (not surprising as I don't think this is working properly through LVM):
e2fsck -b 32 /dev/VolGroup/Video
"Bad magic number in super-block while trying to open /dev/VolGroup/Video"
I know not to run fsck on the raw partitions themselves, of course this will hose things.
How do I get these LVM volumes grouped together so I can run a proper check of the drive? I'm hoping I can try a different superblock and get things up and running again, but using fsck isn't exactly my specialty.
I have run SMART diagnostics on all the drives and they all pass the long tests.
Okay, for those of you who may encounter a similar problem, here's how I resolved this:
I booted off the CD and went into rescue mode ("linux rescue" at the boot prompt).
I then made a backup of my fstab.
I then edited the fstab, and removed the reference to the LVM volume group (/video in my case).
I then rebooted normally.
Everything of course came back up just fine, but without the /video partition - but with LVM support.
I was then able to run fsck on the disk like so:
First I located the backup superblocks on my system like so:
mke2fs -n /dev/VolGroup/Video
The -n option makes it simply show what it WOULD do if you asked it to create a new file system, but doesn't actually create the file system itself.
From the list of backup superblocks this generated, I picked the lowest one (I have no idea if it makes a difference which one you choose, I guessed, it worked).
I then ran fsck using the -b option to specify the backup superblock to use:
fsck -b 32768 /dev/VolGroup/Video
This then went through and found all the problems in the drive and fixed them. Note that you might want to use the "yes" option (-y) to have it automatically say yes to everything - I'm paranoid and didn't, but it did mean hitting "y" about 4,983 times - good thing for fast key repeat rates. :-)
Once fsck was done fixing everything, I put back my backup copy of fstab which included the reference to my /video mount and rebooted.
Voila! All data back and happy.
At this point I backed everything up and am investigating the cause of this problem - dmesg has some scary stuff about hard drive errors in it so I suspect I have a hard drive on the verge of death. Nevermind that they are all less than 3 months old. Grr. In either case, I was successfully in recovering 100% of my needed data (note that I had a backup of these files, so I only had to get about 40GB out of the 900GB - just the new files that had changed since last backup. I suspect some data may have been lost in this process, but apparently, none that I needed).
Also, in the process of resolving this issue someone pointed out to me that I am mixing both SATA and PATA drives in the same LVM, plus multiple brands with minor size differences. LVM seems to support this, but I had several people tell me they suspect that is the cause of this failure. If anyone has anything else to add to that, let me know - I suspect a hardware failure, not a configuration problem is the cause here.
Next up: Replacing this unreliable LVM mess with a software RAID 5 solution. Gonna need some bigger drives.....
|All times are GMT -5. The time now is 12:55 PM.|