LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Hardware (http://www.linuxquestions.org/questions/linux-hardware-18/)
-   -   LVM Nightmare, how do I remove FUBAR hdd's? (http://www.linuxquestions.org/questions/linux-hardware-18/lvm-nightmare-how-do-i-remove-fubar-hdds-712829/)

x1um1n 03-19-2009 10:27 AM

LVM Nightmare, how do I remove FUBAR hdd's?
 
Hi Guys,
A couple of weeks ago I stuck a new drive in my lvm array, it worked fine for about 3 days then started making that horrible click-scrape, click scrape noise you get when the heads die..

I've been struggling since then to remove the bloody thing from my array, but whenever I boot it up it either doesn't find the drive, or it only finds it briefly.

This means I can't just lvreduce it.

I'm about at my wits end..

If I remove the hdd, is there any way I can recover the rest of the array?

atm pvscan (or any other lvm command) just tells me it can't find all the pv's in my vg and so can't do anything..


I've done quite a lot of googling and haven't been able to find anything other than hints that I might be able to forcibly remove the pv's. But I can't find anything that tells me HOW?

Please help!

--

"Computer, if you don't open that hatch right now I'm going to go to your major databanks with a large axe and give you a reprogramming you'll never forget!"

Zaphod Beeblebrox

syg00 03-19-2009 02:23 PM

I don't use LVM (for several reasons, you may have just found another one), but the manpage for vgcfgrestore indicates you might be able to work around this if you are replacing the disk. Got a(nother) spare ?.
This bit is a bit of a worry ...
Quote:

If a PV in the VG is lost and you wish to substitute another of the same size ...
It doesn't offer any option if you want to use a different size - I guess pvcreate forcing the size might work presuming you use a larger disk.

Good luck.

<Edit:>
LVM is *not* for data integrity - you still need backups. You'll need to restore the LV(s) once the VG is re-built.
</Edit:>

lazlow 03-19-2009 02:49 PM

Not able to help you with this issue, but when you go to rebuild I can tell you how to do it without using LVMs and get almost (see below) all the same functionality. The only feature that cannot be handled in other(simpler, safer) ways is contiguous space (one large file across multiple drives). You have just found one of the MANY PITA problems with LVMs, which for most users offer no extra functionality.

x1um1n 03-19-2009 03:00 PM

Thanks Guys,
unfortunately the disk in question is a 1.5Tb and from what I've been reading these things are flaky as hell (as I'm now discovering first hand) so I was planning on replacing it with a 1Tb.

Also, the reason I use LVM is for it's ability to spread a partition (currently 5.6Tb) across multiple disks.

I'm beginning to wish I'd just broken the tree down into a less elegant but more stable solution:

/blah/1 --> sda
/blah/2 --> sdb
/blah/3 --> sdc

Unless you can suggest a better option?

In the meantime I'll have a look into vgcfgrestore..

lazlow 03-19-2009 03:31 PM

Yep, keep each drive independent. That way any drive fails you only have to deal with just that. You can setup your mount points on the master drive (sda usually) in such a manner that the casual user cannot tell they are actually using a different drive. I have /home on a separate drive (mount point on master) and I have a /data subdirectory in each users home that is in reality on a third drive. Once you understand mount points and using fstab it is a breeze to set this up or rearrange it.

x1um1n 03-19-2009 03:32 PM

Ok, vgcfgrestore can restore the array to the state it was in in a backup file.

Unfortunately the backup doesn't go far back enough.. :(

It looks like it's not 1 disk that's died, but 2.

Code:

root@omnius:~# pvscan
File descriptor 5 left open
  Couldn't find device with uuid 'XeBtci-t5fV-pvlz-Bwu9-CXqG-niq7-KjPgTV'.
  Couldn't find device with uuid 'CJEQAz-CenQ-Kn7q-4nZ5-X0oE-2s3F-V4AEcw'.
  Couldn't find device with uuid 'XeBtci-t5fV-pvlz-Bwu9-CXqG-niq7-KjPgTV'.
  Couldn't find device with uuid 'CJEQAz-CenQ-Kn7q-4nZ5-X0oE-2s3F-V4AEcw'.
  Couldn't find device with uuid 'XeBtci-t5fV-pvlz-Bwu9-CXqG-niq7-KjPgTV'.
  Couldn't find device with uuid 'CJEQAz-CenQ-Kn7q-4nZ5-X0oE-2s3F-V4AEcw'.
  Couldn't find device with uuid 'XeBtci-t5fV-pvlz-Bwu9-CXqG-niq7-KjPgTV'.
  Couldn't find device with uuid 'CJEQAz-CenQ-Kn7q-4nZ5-X0oE-2s3F-V4AEcw'.
  Couldn't find device with uuid 'XeBtci-t5fV-pvlz-Bwu9-CXqG-niq7-KjPgTV'.
  Couldn't find device with uuid 'CJEQAz-CenQ-Kn7q-4nZ5-X0oE-2s3F-V4AEcw'.
  Couldn't find device with uuid 'XeBtci-t5fV-pvlz-Bwu9-CXqG-niq7-KjPgTV'.
  Couldn't find device with uuid 'CJEQAz-CenQ-Kn7q-4nZ5-X0oE-2s3F-V4AEcw'.
  Couldn't find device with uuid 'XeBtci-t5fV-pvlz-Bwu9-CXqG-niq7-KjPgTV'.
  Couldn't find device with uuid 'CJEQAz-CenQ-Kn7q-4nZ5-X0oE-2s3F-V4AEcw'.
  Couldn't find device with uuid 'XeBtci-t5fV-pvlz-Bwu9-CXqG-niq7-KjPgTV'.
  Couldn't find device with uuid 'CJEQAz-CenQ-Kn7q-4nZ5-X0oE-2s3F-V4AEcw'.
  Couldn't find device with uuid 'XeBtci-t5fV-pvlz-Bwu9-CXqG-niq7-KjPgTV'.
  Couldn't find device with uuid 'CJEQAz-CenQ-Kn7q-4nZ5-X0oE-2s3F-V4AEcw'.
  Couldn't find device with uuid 'XeBtci-t5fV-pvlz-Bwu9-CXqG-niq7-KjPgTV'.
  Couldn't find device with uuid 'CJEQAz-CenQ-Kn7q-4nZ5-X0oE-2s3F-V4AEcw'.
  PV /dev/sdc1        VG omnius_vg  lvm2 [931.51 GB / 0    free]
  PV /dev/sdb1        VG omnius_vg  lvm2 [931.51 GB / 0    free]
  PV /dev/sda4        VG omnius_vg  lvm2 [670.11 GB / 0    free]
  PV /dev/sdd1        VG omnius_vg  lvm2 [931.51 GB / 0    free]
  PV unknown device  VG omnius_vg  lvm2 [1.36 TB / 0    free]
  PV unknown device  VG omnius_vg  lvm2 [1.36 TB / 972.00 MB free]
  Total: 6 [6.11 TB] / in use: 6 [6.11 TB] / in no VG: 0 [0  ]

The oldest backup lvm's got is just before I added the last disk and if I try to restore from it, I get this:

Code:

root@omnius:~# vgcfgrestore -f /etc/lvm/archive/omnius_vg_00000.vg omnius_vgFile descriptor 5 left open
  Couldn't find device with uuid 'XeBtci-t5fV-pvlz-Bwu9-CXqG-niq7-KjPgTV'.
  Couldn't find all physical volumes for volume group omnius_vg.
  Restore failed.

I'm going to try again in a few hours, the kernel seems to pick the disks up intermittently, maybe if I leave the newest disk out it'll be able to see the slightly older one and I can try and lvreduce/resize2fs that one out of the array :$

x1um1n 03-19-2009 03:38 PM

:lazlow

unfortunately, this one partition is the equivalent of your /data dir. I used to have it broken down into categories but the volume of data's outgrown that solution. Some categories are WAY bigger than 1Tb, others are only 200Gb.

That's why I turned to lvm in the first place.

lazlow 03-19-2009 04:15 PM

If you are getting that big skip LVM and go straight to raid. Go ahead and spend the $300 for a true hardware raid card. I know it seems like a ton of money but in the long run it will save you time and hair.

x1um1n 03-20-2009 09:25 PM

Reet, managed to get it sorted, mostly..

I used vgcfgrestore to revert to how it was before the last drive was put in. Then physically removed that drive from the box.

The other drive which is dying (but not yet dead) I've managed to deal with by lvresize -L-1.4T /dev/vg/lv, now I'm just trying to get resize2fs to shrink the fs to match the partition size (then I can see how much data I've lost :()

thank fsck for backups..

unfortunately, while the other drive was being relatively friendly yday, it's being less so today..

Thanks for all your help on this one guys

x1um1n 03-20-2009 10:43 PM

Reet, resize2fs wasn't happy, it claimed:

"resize2fs: Can't read an block bitmap while trying to resize"

which apart from being almost as grammatically sound as "all your base are belong to us" isn't tremendously helpful

I stuck it in google, (it was the only google suggestion btw, worrying when that happens) and the only help I could find was, pull the data off and start afresh..

fortunately there's only 1.3Tb ish of data which isn't backed up, and now I've got the fubar drives out I can swap em for some slightly more stable 1Tb drives, this shouldn't be a problem.

I can still mount the partition, but I'm guessing it wouldn't be too happy if I tried to put anything in the final 3Tb of it :)

---

Now, does anyone know who to email to get this fixed? lvm's been great, 'til now, but a storage array that becomes completely useless when 1 hdd fails is pathetic. This needs sorting out, while ubu doesn't use lvm except under duress, fedora has been using it by default for the last couple years and NixOS (a VERY interesting new distro, check it out if you haven't heard of it) can't really handle anything else.

Most worryingly of all, last time I played with Enterprise *nix (RHEL4) i'm pretty sure it wanted to LVM at install time..

this needs fixing!

syg00 03-20-2009 11:05 PM

Try the mail list mentioned here.
I suspect they'll tell you to add new drives and (after the vgcfgrestore) restore the data.

As for NixOS, I keep meaning to look at it, but I'm really more interested in Nix itself.


All times are GMT -5. The time now is 03:04 PM.