LVM Nightmare, how do I remove FUBAR hdd's?
A couple of weeks ago I stuck a new drive in my lvm array, it worked fine for about 3 days then started making that horrible click-scrape, click scrape noise you get when the heads die..
I've been struggling since then to remove the bloody thing from my array, but whenever I boot it up it either doesn't find the drive, or it only finds it briefly.
This means I can't just lvreduce it.
I'm about at my wits end..
If I remove the hdd, is there any way I can recover the rest of the array?
atm pvscan (or any other lvm command) just tells me it can't find all the pv's in my vg and so can't do anything..
I've done quite a lot of googling and haven't been able to find anything other than hints that I might be able to forcibly remove the pv's. But I can't find anything that tells me HOW?
"Computer, if you don't open that hatch right now I'm going to go to your major databanks with a large axe and give you a reprogramming you'll never forget!"
I don't use LVM (for several reasons, you may have just found another one), but the manpage for vgcfgrestore indicates you might be able to work around this if you are replacing the disk. Got a(nother) spare ?.
This bit is a bit of a worry ...
LVM is *not* for data integrity - you still need backups. You'll need to restore the LV(s) once the VG is re-built.
Not able to help you with this issue, but when you go to rebuild I can tell you how to do it without using LVMs and get almost (see below) all the same functionality. The only feature that cannot be handled in other(simpler, safer) ways is contiguous space (one large file across multiple drives). You have just found one of the MANY PITA problems with LVMs, which for most users offer no extra functionality.
unfortunately the disk in question is a 1.5Tb and from what I've been reading these things are flaky as hell (as I'm now discovering first hand) so I was planning on replacing it with a 1Tb.
Also, the reason I use LVM is for it's ability to spread a partition (currently 5.6Tb) across multiple disks.
I'm beginning to wish I'd just broken the tree down into a less elegant but more stable solution:
/blah/1 --> sda
/blah/2 --> sdb
/blah/3 --> sdc
Unless you can suggest a better option?
In the meantime I'll have a look into vgcfgrestore..
Yep, keep each drive independent. That way any drive fails you only have to deal with just that. You can setup your mount points on the master drive (sda usually) in such a manner that the casual user cannot tell they are actually using a different drive. I have /home on a separate drive (mount point on master) and I have a /data subdirectory in each users home that is in reality on a third drive. Once you understand mount points and using fstab it is a breeze to set this up or rearrange it.
Ok, vgcfgrestore can restore the array to the state it was in in a backup file.
Unfortunately the backup doesn't go far back enough.. :(
It looks like it's not 1 disk that's died, but 2.
unfortunately, this one partition is the equivalent of your /data dir. I used to have it broken down into categories but the volume of data's outgrown that solution. Some categories are WAY bigger than 1Tb, others are only 200Gb.
That's why I turned to lvm in the first place.
If you are getting that big skip LVM and go straight to raid. Go ahead and spend the $300 for a true hardware raid card. I know it seems like a ton of money but in the long run it will save you time and hair.
Reet, managed to get it sorted, mostly..
I used vgcfgrestore to revert to how it was before the last drive was put in. Then physically removed that drive from the box.
The other drive which is dying (but not yet dead) I've managed to deal with by lvresize -L-1.4T /dev/vg/lv, now I'm just trying to get resize2fs to shrink the fs to match the partition size (then I can see how much data I've lost :()
thank fsck for backups..
unfortunately, while the other drive was being relatively friendly yday, it's being less so today..
Thanks for all your help on this one guys
Reet, resize2fs wasn't happy, it claimed:
"resize2fs: Can't read an block bitmap while trying to resize"
which apart from being almost as grammatically sound as "all your base are belong to us" isn't tremendously helpful
I stuck it in google, (it was the only google suggestion btw, worrying when that happens) and the only help I could find was, pull the data off and start afresh..
fortunately there's only 1.3Tb ish of data which isn't backed up, and now I've got the fubar drives out I can swap em for some slightly more stable 1Tb drives, this shouldn't be a problem.
I can still mount the partition, but I'm guessing it wouldn't be too happy if I tried to put anything in the final 3Tb of it :)
Now, does anyone know who to email to get this fixed? lvm's been great, 'til now, but a storage array that becomes completely useless when 1 hdd fails is pathetic. This needs sorting out, while ubu doesn't use lvm except under duress, fedora has been using it by default for the last couple years and NixOS (a VERY interesting new distro, check it out if you haven't heard of it) can't really handle anything else.
Most worryingly of all, last time I played with Enterprise *nix (RHEL4) i'm pretty sure it wanted to LVM at install time..
this needs fixing!
Try the mail list mentioned here.
I suspect they'll tell you to add new drives and (after the vgcfgrestore) restore the data.
As for NixOS, I keep meaning to look at it, but I'm really more interested in Nix itself.
|All times are GMT -5. The time now is 10:39 PM.|