LinuxQuestions.org - Replacing dead drive in an LVM that consisted of 3 drives

- Linux - Hardware (https://www.linuxquestions.org/questions/linux-hardware-18/)

- - Replacing dead drive in an LVM that consisted of 3 drives (https://www.linuxquestions.org/questions/linux-hardware-18/replacing-dead-drive-in-an-lvm-that-consisted-of-3-drives-826117/)

Replacing dead drive in an LVM that consisted of 3 drives

Hi, I have an old computer here running Fedora Core 2 that had 3 hard drives mounted as LVM2.

This is the output from parted for the first disk -

Code:

Using /dev/sda

Information: The operating system thinks the geometry on /dev/sda is

48641/255/63.  Therefore, cylinder 1024 ends at 8032.499M.

(parted) print

Disk geometry for /dev/sda: 0.000-381554.085 megabytes

Disk label type: msdos

Minor    Start      End    Type      Filesystem  Flags

1          0.031  7820.705  primary  ntfs        boot

2      7820.706  15641.411  primary  ext3

3      15641.411  16669.006  primary  linux-swap

4      16669.006 381551.594  primary              lvm

This is for the second disk -

Code:

Using /dev/sdb

Information: The operating system thinks the geometry on /dev/sdb is

48641/255/63.  Therefore, cylinder 1024 ends at 8032.499M.

(parted) print

Disk geometry for /dev/sdb: 0.000-381554.085 megabytes

Disk label type: msdos

Minor    Start      End    Type      Filesystem  Flags

1          0.031 381551.594  primary              lvm

...and the third disk is dead. It spins, but makes a scraping noise and is not recognized by BIOS, although BIOS hangs at startup tryinto to recognize it.

This is what I get for pvscan -

Code:

[root@computer ~]# pvscan

Couldn't find device with uuid 'W0EUm5-wP50-qZNu-r81K-VNec-vivn-DDjXAx'.

  Couldn't find device with uuid 'W0EUm5-wP50-qZNu-r81K-VNec-vivn-DDjXAx'.

  PV unknown device  VG vhe8_disks  lvm2 [372.56 GB / 0    free]

  PV /dev/sda4        VG vhe8_disks  lvm2 [356.31 GB / 0    free]

  PV /dev/sdb1        VG vhe8_disks  lvm2 [372.59 GB / 0    free]

  Total: 3 [1.08 TB] / in use: 3 [1.08 TB] / in no VG: 0 [0  ]

...and lvscan

Code:

  [root@computer ~]# lvscan

Couldn't find device with uuid 'W0EUm5-wP50-qZNu-r81K-VNec-vivn-DDjXAx'.

  Couldn't find all physical volumes for volume group vhe8_disks.

  Couldn't find device with uuid 'W0EUm5-wP50-qZNu-r81K-VNec-vivn-DDjXAx'.

  Couldn't find all physical volumes for volume group vhe8_disks.

  Volume group "vhe8_disks" not found

...and vgscan

Code:

  [root@computer ~]# vgscan

Reading all physical volumes.  This may take a while...

  Couldn't find device with uuid 'W0EUm5-wP50-qZNu-r81K-VNec-vivn-DDjXAx'.

  Couldn't find all physical volumes for volume group vhe8_disks.

  Couldn't find device with uuid 'W0EUm5-wP50-qZNu-r81K-VNec-vivn-DDjXAx'.

  Couldn't find all physical volumes for volume group vhe8_disks.

  Volume group "vhe8_disks" not found

So I looked up solutions to this problem, and I arrived at the following website - http://www.novell.com/coolsolutions/appnote/19386.html

I followed the directions titled "Disk Permanently Removed" with hopes that I could at least get access to the data on the two good disks. I followed the directions on there, and got a new 500GB HD (the previous drive was 400GB, they both spun 7200RPM). It wasn't formatted or anything, brand spankin' new right out of the package. Here is its information via parted -

Code:

Disk geometry for /dev/sdc: 0.000-476940.023 megabytes

Disk label type: msdos

Minor    Start      End    Type      Filesystem  Flags

Nothin! That's good right? The directions on that page say "1. Add a replacement disk to the server. Make sure the disk is empty." That disk was as empty as it can get.

So, I continue following directions -

Code:

[root@computer ~]# pvcreate --uuid W0EUm5-wP50-qZNu-r81K-VNec-vivn-DDjXAx /dev/sdc

  No physical volume label read from /dev/sdc

  Physical volume "/dev/sdc" successfully created

Ok, the directions didn't say I'd see "No physical volume label read from /dev/sdc" but the next line looks good, so I continue -

Code:

[root@computer ~]# vgcfgrestore vhe8_disks

  Restored volume group vhe8_disks



[root@computer ~]# vgscan

  Reading all physical volumes.  This may take a while...

  Found volume group "vhe8_disks" using metadata type lvm2





[root@computer ~]# vgchange -ay vhe8_disks

  1 logical volume(s) in volume group "vhe8_disks" now active

Looks good so far! On to the last step -

Code:

[root@computer ~]# e2fsck -y /dev/vhe8_disks/data

e2fsck 1.35 (28-Feb-2004)

Couldn't find ext2 superblock, trying backup blocks...

e2fsck: Bad magic number in super-block while trying to open /dev/vhe8_disks/data

 

The superblock could not be read or does not describe a correct ext2

filesystem.  If the device is valid and it really contains an ext2

filesystem (and not swap or ufs or something else), then the superblock

is corrupt, and you might try running e2fsck with an alternate superblock:

    e2fsck -b 8193 <device>

aww crap.
So I tried many different things, including formatting the disk beforehand. I formatted it as ext2 and reiserfs, and the outcome was the same. I went into fdisk and did a few things and this is the output -

Code:

Command (m for help): p

 

Disk /dev/sdc: 500.1 GB, 500107862016 bytes

255 heads, 63 sectors/track, 60801 cylinders

Units = cylinders of 16065 * 512 = 8225280 bytes

 

  Device Boot      Start        End      Blocks  Id  System

/dev/sdc1              1      60801  488384001  8e  Linux LVM

Then repeat the whole process over again, same outcome. I've used /dev/sdc1 in the pvcreate step instead of just /dev/sdc, but it still has the same outcome at the e2fsck step.

I can't get the computer to boot without commenting out the volume in fstab, but it won't mount.

Is there any way to view the data on the two working disks? I'm stumped.

You have to resize the filesystem after the pv has been created. The volume group descriptors are probably still referring to the metadata on the old disk, and without the correct metadata you have little hope of finding any files. LVM doesn't need the disks to be partitioned.

You could complete the process of adding the new disk by allocating the physical extents to the new pv and then resizing the filesystem to include that space and see if that helps. You should extend the LV group to include the new disk.

At this stage you might also try this and see if you can get the volume to mount.

You might get access to some files on the original 2 disks, but you may find files listed that don't actually exist, due to them being on the dead drive.

@smoker you've lost me. The OP did a vgcfgrestore and vgscan worked - so the meta-data appears valid. Depends if things have changed since the backup was taken.
The error appears to be with the filesystem itself. What f/s was there before the failure ?. Which was he first disk in the vg ?.

Yeah I think the problem is in the file system. After running through the vgcreate/vgcfgrestore process with a completely new and empty disk, I booted up the computer in Ubuntu Live and looked at the disks with gparted. It showed the file systems on the two working disks as being lvm2, but the new disk as being unallocated.

Next, I tried changing the system id to Linux LVM. Unfortunately I don't have access to this computer at the moment, but one thing I did notice was a slight difference in file systems when I parted on them. I got something like this (like I said, I don't have access to this computer right now, so this isn't completely accurate. Asterisks indicate a specific number I don't know off the top of my head. The important part I want you to notice is in bold). -

Code:

Using /dev/sdc

Information: The operating system thinks the geometry on /dev/sdc is

************.  Therefore, cylinder 1024 ends at ********M.

(parted) print

Disk geometry for /dev/sdc: 0.000-476940.023 megabytes

Disk label type: msdos

Minor    Start      End    Type      Filesystem  Flags

1          0.031 **********  primary      ext2    lvm

The two working disks don't show any file system (as you can see in my initial post). I just ran fdisk to create a partition then changed its system id to 8e Linux LVM, I didn't format it to ext2.

Quote:

You have to resize the filesystem after the pv has been created.

I'm wary corrupting data by resizing the volume without the original 3 disks. Does using a 500GB disk as opposed to a 400GB disk make a difference? Or should I even be using the exact same disk as the bad one?

Quote:

What f/s was there before the failure ?. Which was he first disk in the vg ?.

I'm not sure what file system was there before. I'll give you a little background on this fiasco; this computer was not being used when I was tasked with recovering this data. When my coworker initially tried to boot it, it wouldn't boot. If I recall correctly, BIOS only recognized one disk at the time, which was in the drive 1 slot (this was the disk that is now sda, it was sdb at that time). At the time I had no idea I was dealing with a logical volume, so I just began checking for dead disks. I rearranged them so the two working disks were in drives 0 and 1. I found that the sda disk (the one with 4 partitions) must come before the sdb disk for the computer to boot (ie, sda had to be in slot 0, and sdb could be in slot 1,2,3, or sda in slot 1 and sdb in slot 2,3, but not 0, etc.), and that I had to comment out the logical volume in fstab.
So, I'm only assuming the disk that is currently sda was the first disk in the vg, since it's the disk with the / partition. However, it very well could have been the dead disk, I really don't know. That's one thing that occurred to me, that the superblock could be on the dead disk. Would this matter for a logical volume?

There are backups for the metadata stored on the computer at /etc/lvm/backup and /etc/lvm/archive. I'll post them when I go back to work on monday.

This would all be very simple if only they'd created backups!!

At this point I'm seriously thinking about freezing the dead disk http://www.datarecoverypros.com/hard...ry-freeze.html, hoping it works, and doing a dd!
Thanks for all your help everyone, I really appreciate it!:hattip:

When I had a disk die on me, it was one of a set of 5 in LVM. I added a new disk as per the normal method, forcibly removed the old one from the group, then ran the utility that rebuilds the metadata from the existing file. I lost whatever files were on the dead disk, but I got the rest of them back. Overall, I lost about 5 GB of files from (at the time) an LVM that had 700 GB written to it.

I see the OP has moved disks around, and this could also complicate matters. I did have a howto that I wrote when my disk went bad, but due to an unfortunate (and careless) rm error recently, I lost a lot of text files in my home directory !

The disks don't have to be the same size if you replace them. I suggested adding the new extents and resizing the volume only to get the LV as close to working order as possible (this should also rewrite the metadata properly). vgscan is showing there is a LV there, but it's not mounting. So it seems appropriate to rebuild the LV properly. Otherwise there isn't much point using the new disk at all, just forcibly remove the dead pv from the LV.

This is the backup file found in /etc/lvm/backup/

Code:

# Generated by LVM2: Fri Oct 15 14:20:06 2004



contents = "Text Format Volume Group"

version = 1



description = "Created *after* executing 'lvextend -l+23325 /dev/vhe8_disks/data /dev/sda4 /dev/sdc1'"



creation_host = "computer.edu"    # Linux computer.edu 2.6.8-1.521smp #1 SMP Mon Aug 16 09:25:06 EDT 2004 i686

creation_time = 1097875206    # Fri Oct 15 14:20:06 2004



vhe8_disks {

    id = "koikjy-RIW0-Xh8J-LFhm-MD6r-74cz-0ynKUd"

    seqno = 4

    status = ["RESIZEABLE", "READ", "WRITE"]

    extent_size = 65536        # 32 Megabytes

    max_lv = 255

    max_pv = 255



    physical_volumes {



        pv0 {

            id = "W0EUm5-wP50-qZNu-r81K-VNec-vivn-DDjXAx"

            device = "/dev/sdb1"    # Hint only



            status = ["ALLOCATABLE"]

            pe_start = 384

            pe_count = 11922    # 372.562 Gigabytes

        }



        pv1 {

            id = "CEZMJG-39cx-Xner-ktnw-93Xn-Ipx3-h3uOw4"

            device = "/dev/sda4"    # Hint only



            status = ["ALLOCATABLE"]

            pe_start = 384

            pe_count = 11402    # 356.312 Gigabytes

        }



        pv2 {

            id = "X3RRy8-d8mK-fZb2-Zt4v-Pp0z-bWTC-YUZ9xf"

            device = "/dev/sdc1"    # Hint only



            status = ["ALLOCATABLE"]

            pe_start = 384

            pe_count = 11923    # 372.594 Gigabytes

        }

    }



    logical_volumes {



        data {

            id = "Us6orV-LYkq-XLGA-84Ep-AqDi-G7DX-G9K5rs"

            status = ["READ", "WRITE", "VISIBLE"]

            segment_count = 3



            segment1 {

                start_extent = 0

                extent_count = 11922    # 372.562 Gigabytes



                type = "striped"

                stripe_count = 1    # linear



                stripes = [

                    "pv0", 0

                ]

            }

            segment2 {

                start_extent = 11922

                extent_count = 11402    # 356.312 Gigabytes



                type = "striped"

                stripe_count = 1    # linear



                stripes = [

                    "pv1", 0

                ]

            }

            segment3 {

                start_extent = 23324

                extent_count = 11923    # 372.594 Gigabytes



                type = "striped"

                stripe_count = 1    # linear



                stripes = [

                    "pv2", 0

                ]

            }

        }

    }

}

I rearranged the disks to reflect their position as shown in the backup file, but I still get the same issue with the e2fsck step. Wouldn't there be other areas where the superblock is stored? The only reason I think it'd be an issue is because the only copy of the superblock was on the dead disk. This sorta makes sense, because pv0 has the uuid of the missing disk. But still, I'd think there'd be backups on the other disks.

As you can see, the data is striped. Does this mean that I'll only be able to recover 2/3 of each file? If so, it would seem the only option is to recover the data from the dead disk.