LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Hardware (http://www.linuxquestions.org/questions/linux-hardware-18/)
-   -   Replacing dead drive in an LVM that consisted of 3 drives (http://www.linuxquestions.org/questions/linux-hardware-18/replacing-dead-drive-in-an-lvm-that-consisted-of-3-drives-826117/)

Bmop 08-13-2010 08:17 PM

Replacing dead drive in an LVM that consisted of 3 drives
 
Hi, I have an old computer here running Fedora Core 2 that had 3 hard drives mounted as LVM2.

This is the output from parted for the first disk -
Code:

Using /dev/sda
Information: The operating system thinks the geometry on /dev/sda is
48641/255/63.  Therefore, cylinder 1024 ends at 8032.499M.
(parted) print
Disk geometry for /dev/sda: 0.000-381554.085 megabytes
Disk label type: msdos
Minor    Start      End    Type      Filesystem  Flags
1          0.031  7820.705  primary  ntfs        boot
2      7820.706  15641.411  primary  ext3
3      15641.411  16669.006  primary  linux-swap
4      16669.006 381551.594  primary              lvm

This is for the second disk -
Code:

Using /dev/sdb
Information: The operating system thinks the geometry on /dev/sdb is
48641/255/63.  Therefore, cylinder 1024 ends at 8032.499M.
(parted) print
Disk geometry for /dev/sdb: 0.000-381554.085 megabytes
Disk label type: msdos
Minor    Start      End    Type      Filesystem  Flags
1          0.031 381551.594  primary              lvm

...and the third disk is dead. It spins, but makes a scraping noise and is not recognized by BIOS, although BIOS hangs at startup tryinto to recognize it.

This is what I get for pvscan -
Code:

[root@computer ~]# pvscan
Couldn't find device with uuid 'W0EUm5-wP50-qZNu-r81K-VNec-vivn-DDjXAx'.
  Couldn't find device with uuid 'W0EUm5-wP50-qZNu-r81K-VNec-vivn-DDjXAx'.
  PV unknown device  VG vhe8_disks  lvm2 [372.56 GB / 0    free]
  PV /dev/sda4        VG vhe8_disks  lvm2 [356.31 GB / 0    free]
  PV /dev/sdb1        VG vhe8_disks  lvm2 [372.59 GB / 0    free]
  Total: 3 [1.08 TB] / in use: 3 [1.08 TB] / in no VG: 0 [0  ]

...and lvscan
Code:

  [root@computer ~]# lvscan
Couldn't find device with uuid 'W0EUm5-wP50-qZNu-r81K-VNec-vivn-DDjXAx'.
  Couldn't find all physical volumes for volume group vhe8_disks.
  Couldn't find device with uuid 'W0EUm5-wP50-qZNu-r81K-VNec-vivn-DDjXAx'.
  Couldn't find all physical volumes for volume group vhe8_disks.
  Volume group "vhe8_disks" not found

...and vgscan
Code:

  [root@computer ~]# vgscan
Reading all physical volumes.  This may take a while...
  Couldn't find device with uuid 'W0EUm5-wP50-qZNu-r81K-VNec-vivn-DDjXAx'.
  Couldn't find all physical volumes for volume group vhe8_disks.
  Couldn't find device with uuid 'W0EUm5-wP50-qZNu-r81K-VNec-vivn-DDjXAx'.
  Couldn't find all physical volumes for volume group vhe8_disks.
  Volume group "vhe8_disks" not found

So I looked up solutions to this problem, and I arrived at the following website - http://www.novell.com/coolsolutions/appnote/19386.html

I followed the directions titled "Disk Permanently Removed" with hopes that I could at least get access to the data on the two good disks. I followed the directions on there, and got a new 500GB HD (the previous drive was 400GB, they both spun 7200RPM). It wasn't formatted or anything, brand spankin' new right out of the package. Here is its information via parted -
Code:

Disk geometry for /dev/sdc: 0.000-476940.023 megabytes
Disk label type: msdos
Minor    Start      End    Type      Filesystem  Flags

Nothin! That's good right? The directions on that page say "1. Add a replacement disk to the server. Make sure the disk is empty." That disk was as empty as it can get.

So, I continue following directions -
Code:

[root@computer ~]# pvcreate --uuid W0EUm5-wP50-qZNu-r81K-VNec-vivn-DDjXAx /dev/sdc
  No physical volume label read from /dev/sdc
  Physical volume "/dev/sdc" successfully created

Ok, the directions didn't say I'd see "No physical volume label read from /dev/sdc" but the next line looks good, so I continue -
Code:

[root@computer ~]# vgcfgrestore vhe8_disks
  Restored volume group vhe8_disks

[root@computer ~]# vgscan
  Reading all physical volumes.  This may take a while...
  Found volume group "vhe8_disks" using metadata type lvm2


[root@computer ~]# vgchange -ay vhe8_disks
  1 logical volume(s) in volume group "vhe8_disks" now active

Looks good so far! On to the last step -
Code:

[root@computer ~]# e2fsck -y /dev/vhe8_disks/data
e2fsck 1.35 (28-Feb-2004)
Couldn't find ext2 superblock, trying backup blocks...
e2fsck: Bad magic number in super-block while trying to open /dev/vhe8_disks/data
 
The superblock could not be read or does not describe a correct ext2
filesystem.  If the device is valid and it really contains an ext2
filesystem (and not swap or ufs or something else), then the superblock
is corrupt, and you might try running e2fsck with an alternate superblock:
    e2fsck -b 8193 <device>

aww crap.
So I tried many different things, including formatting the disk beforehand. I formatted it as ext2 and reiserfs, and the outcome was the same. I went into fdisk and did a few things and this is the output -
Code:

Command (m for help): p
 
Disk /dev/sdc: 500.1 GB, 500107862016 bytes
255 heads, 63 sectors/track, 60801 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
 
  Device Boot      Start        End      Blocks  Id  System
/dev/sdc1              1      60801  488384001  8e  Linux LVM

Then repeat the whole process over again, same outcome. I've used /dev/sdc1 in the pvcreate step instead of just /dev/sdc, but it still has the same outcome at the e2fsck step.

I can't get the computer to boot without commenting out the volume in fstab, but it won't mount.

Is there any way to view the data on the two working disks? I'm stumped.

smoker 08-13-2010 11:02 PM

You have to resize the filesystem after the pv has been created. The volume group descriptors are probably still referring to the metadata on the old disk, and without the correct metadata you have little hope of finding any files. LVM doesn't need the disks to be partitioned.

You could complete the process of adding the new disk by allocating the physical extents to the new pv and then resizing the filesystem to include that space and see if that helps. You should extend the LV group to include the new disk.

At this stage you might also try this and see if you can get the volume to mount.

You might get access to some files on the original 2 disks, but you may find files listed that don't actually exist, due to them being on the dead drive.

syg00 08-14-2010 12:00 AM

@smoker you've lost me. The OP did a vgcfgrestore and vgscan worked - so the meta-data appears valid. Depends if things have changed since the backup was taken.
The error appears to be with the filesystem itself. What f/s was there before the failure ?. Which was he first disk in the vg ?.

Bmop 08-14-2010 03:54 AM

Yeah I think the problem is in the file system. After running through the vgcreate/vgcfgrestore process with a completely new and empty disk, I booted up the computer in Ubuntu Live and looked at the disks with gparted. It showed the file systems on the two working disks as being lvm2, but the new disk as being unallocated.

Next, I tried changing the system id to Linux LVM. Unfortunately I don't have access to this computer at the moment, but one thing I did notice was a slight difference in file systems when I parted on them. I got something like this (like I said, I don't have access to this computer right now, so this isn't completely accurate. Asterisks indicate a specific number I don't know off the top of my head. The important part I want you to notice is in bold). -
Code:

Using /dev/sdc
Information: The operating system thinks the geometry on /dev/sdc is
************.  Therefore, cylinder 1024 ends at ********M.
(parted) print
Disk geometry for /dev/sdc: 0.000-476940.023 megabytes
Disk label type: msdos
Minor    Start      End    Type      Filesystem  Flags
1          0.031 **********  primary      ext2    lvm

The two working disks don't show any file system (as you can see in my initial post). I just ran fdisk to create a partition then changed its system id to 8e Linux LVM, I didn't format it to ext2.

Quote:

You have to resize the filesystem after the pv has been created.
I'm wary corrupting data by resizing the volume without the original 3 disks. Does using a 500GB disk as opposed to a 400GB disk make a difference? Or should I even be using the exact same disk as the bad one?

Quote:

What f/s was there before the failure ?. Which was he first disk in the vg ?.
I'm not sure what file system was there before. I'll give you a little background on this fiasco; this computer was not being used when I was tasked with recovering this data. When my coworker initially tried to boot it, it wouldn't boot. If I recall correctly, BIOS only recognized one disk at the time, which was in the drive 1 slot (this was the disk that is now sda, it was sdb at that time). At the time I had no idea I was dealing with a logical volume, so I just began checking for dead disks. I rearranged them so the two working disks were in drives 0 and 1. I found that the sda disk (the one with 4 partitions) must come before the sdb disk for the computer to boot (ie, sda had to be in slot 0, and sdb could be in slot 1,2,3, or sda in slot 1 and sdb in slot 2,3, but not 0, etc.), and that I had to comment out the logical volume in fstab.
So, I'm only assuming the disk that is currently sda was the first disk in the vg, since it's the disk with the / partition. However, it very well could have been the dead disk, I really don't know. That's one thing that occurred to me, that the superblock could be on the dead disk. Would this matter for a logical volume?

There are backups for the metadata stored on the computer at /etc/lvm/backup and /etc/lvm/archive. I'll post them when I go back to work on monday.

This would all be very simple if only they'd created backups!!

At this point I'm seriously thinking about freezing the dead disk http://www.datarecoverypros.com/hard...ry-freeze.html, hoping it works, and doing a dd!
Thanks for all your help everyone, I really appreciate it!:hattip:

smoker 08-14-2010 07:40 AM

When I had a disk die on me, it was one of a set of 5 in LVM. I added a new disk as per the normal method, forcibly removed the old one from the group, then ran the utility that rebuilds the metadata from the existing file. I lost whatever files were on the dead disk, but I got the rest of them back. Overall, I lost about 5 GB of files from (at the time) an LVM that had 700 GB written to it.

I see the OP has moved disks around, and this could also complicate matters. I did have a howto that I wrote when my disk went bad, but due to an unfortunate (and careless) rm error recently, I lost a lot of text files in my home directory !

The disks don't have to be the same size if you replace them. I suggested adding the new extents and resizing the volume only to get the LV as close to working order as possible (this should also rewrite the metadata properly). vgscan is showing there is a LV there, but it's not mounting. So it seems appropriate to rebuild the LV properly. Otherwise there isn't much point using the new disk at all, just forcibly remove the dead pv from the LV.

Bmop 08-18-2010 05:59 PM

This is the backup file found in /etc/lvm/backup/
Code:

# Generated by LVM2: Fri Oct 15 14:20:06 2004

contents = "Text Format Volume Group"
version = 1

description = "Created *after* executing 'lvextend -l+23325 /dev/vhe8_disks/data /dev/sda4 /dev/sdc1'"

creation_host = "computer.edu"    # Linux computer.edu 2.6.8-1.521smp #1 SMP Mon Aug 16 09:25:06 EDT 2004 i686
creation_time = 1097875206    # Fri Oct 15 14:20:06 2004

vhe8_disks {
    id = "koikjy-RIW0-Xh8J-LFhm-MD6r-74cz-0ynKUd"
    seqno = 4
    status = ["RESIZEABLE", "READ", "WRITE"]
    extent_size = 65536        # 32 Megabytes
    max_lv = 255
    max_pv = 255

    physical_volumes {

        pv0 {
            id = "W0EUm5-wP50-qZNu-r81K-VNec-vivn-DDjXAx"
            device = "/dev/sdb1"    # Hint only

            status = ["ALLOCATABLE"]
            pe_start = 384
            pe_count = 11922    # 372.562 Gigabytes
        }

        pv1 {
            id = "CEZMJG-39cx-Xner-ktnw-93Xn-Ipx3-h3uOw4"
            device = "/dev/sda4"    # Hint only

            status = ["ALLOCATABLE"]
            pe_start = 384
            pe_count = 11402    # 356.312 Gigabytes
        }

        pv2 {
            id = "X3RRy8-d8mK-fZb2-Zt4v-Pp0z-bWTC-YUZ9xf"
            device = "/dev/sdc1"    # Hint only

            status = ["ALLOCATABLE"]
            pe_start = 384
            pe_count = 11923    # 372.594 Gigabytes
        }
    }

    logical_volumes {

        data {
            id = "Us6orV-LYkq-XLGA-84Ep-AqDi-G7DX-G9K5rs"
            status = ["READ", "WRITE", "VISIBLE"]
            segment_count = 3

            segment1 {
                start_extent = 0
                extent_count = 11922    # 372.562 Gigabytes

                type = "striped"
                stripe_count = 1    # linear

                stripes = [
                    "pv0", 0
                ]
            }
            segment2 {
                start_extent = 11922
                extent_count = 11402    # 356.312 Gigabytes

                type = "striped"
                stripe_count = 1    # linear

                stripes = [
                    "pv1", 0
                ]
            }
            segment3 {
                start_extent = 23324
                extent_count = 11923    # 372.594 Gigabytes

                type = "striped"
                stripe_count = 1    # linear

                stripes = [
                    "pv2", 0
                ]
            }
        }
    }
}

I rearranged the disks to reflect their position as shown in the backup file, but I still get the same issue with the e2fsck step. Wouldn't there be other areas where the superblock is stored? The only reason I think it'd be an issue is because the only copy of the superblock was on the dead disk. This sorta makes sense, because pv0 has the uuid of the missing disk. But still, I'd think there'd be backups on the other disks.

As you can see, the data is striped. Does this mean that I'll only be able to recover 2/3 of each file? If so, it would seem the only option is to recover the data from the dead disk.


All times are GMT -5. The time now is 11:28 AM.