LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Software (https://www.linuxquestions.org/questions/linux-software-2/)
-   -   LVM Reconstructions (https://www.linuxquestions.org/questions/linux-software-2/lvm-reconstructions-4175659027/)

VDIEng 08-12-2019 12:37 PM

LVM Reconstructions
 
Background:
We installed an OVA (vmware 6.0) provided by a manufacture that contained one LVM. The LVM contained only one volume initially (250gb). Through VMware three additional LUNS were mapped to appear at local drives. 26TB, 26TB, 34TB creating a total LVM of ~88TB. According to /etc/fstab the LVM is mounted on /var. After being damaged as listed below the system is no longer directly bootable to my knowledge.

The base system is RHEL 5.10 and utilizes iNodes v1.

What Happened:
Another company came in at some point an "unmounted" at least two of the PVs which caused the LVM to put out a missing flag. Due to the LVM not being activatable the system would no longer boot properly. The customer called me in after someone affected the system and has asked that I attempt to recover the data. When the drives were reconnected I dont know if the original order was maintained.

Data:
To access the data I have two choices, one is a Knoppix recovery CD or a CentOS Live Disk v5 (which is important because of iNodes v1). Here is tons of output from commands issues while on Knoppix as well as some action items we have taken.

Code:

root@Microknoppix:/etc/lvm/backup# pvs -v
  WARNING: Failed to connect to lvmetad. Falling back to device scanning.
    Wiping internal VG cache
    Wiping cache of LVM-capable devices
    There are 2 physical volumes missing.
    There are 2 physical volumes missing.
  PV        VG    Fmt  Attr PSize    PFree DevSize  PV UUID
  /dev/sdb1  vg_dme lvm2 a--  <249.97g    0  <250.00g WRCYu2-idtX-OxyF-6vYF-NG4K-7zof-gCZx64
  /dev/sdc  vg_dme lvm2 a-m  <26.93t    0    26.93t Q767P1-EdM3-36G7-z7Wq-lLeY-rt89-RsEX9K
  /dev/sdd  vg_dme lvm2 a--    26.93t    0    26.93t tjnwOA-yrZr-Lk67-cnbM-ChwX-ihkI-8SC7qy
  /dev/sde  vg_dme lvm2 a-m  <34.00t    0    34.00t 0cRwrB-1Gr2-U721-KbSm-26Lw-EVVN-Db5DuK

I do have a copy of the backup LVM config file, which showed sdc and sde as missing. I attempted to repair by reloading the config file.

Code:

root@Microknoppix:/tmp/old/etc/lvm/backup# vgcfgrestore -f /tmp/old/etc/lvm/backup/vg_dme vg_dme
  WARNING: Failed to connect to lvmetad. Falling back to device scanning.
  Restored volume group vg_dme

Here inline is the config file for the LVM that was loaded, it appears all the blkIDs lined up correctly.

Code:

# Generated by LVM2 version 2.02.88(2)-RHEL5 (2013-06-25): Wed Jul 24 09:53:05 2019

contents = "Text Format Volume Group"
version = 1

description = "Created *after* executing '/usr/sbin/vgs --noheadings -o name'"

creation_host = "CAP-DME-01"# Linux CAP-DME-01 2.6.18-371.8.1.el5 #1 SMP Fri Mar 28 05:53:58 EDT 2014 x86_64
creation_time = 1563976385# Wed Jul 24 09:53:05 2019

vg_dme {
id = "ACdmPs-JeCn-dMl7-W3IL-LfIE-wOcB-q1cQYQ"
seqno = 14
status = ["RESIZEABLE", "READ", "WRITE"]
flags = []
extent_size = 65536# 32 Megabytes
max_lv = 0
max_pv = 0
metadata_copies = 0

physical_volumes {

pv0 {
id = "WRCYu2-idtX-OxyF-6vYF-NG4K-7zof-gCZx64"
device = "/dev/sdb1"# Hint only

status = ["ALLOCATABLE"]
flags = []
dev_size = 524281212# 249.997 Gigabytes
pe_start = 384
pe_count = 7999# 249.969 Gigabytes
}

pv1 {
id = "tjnwOA-yrZr-Lk67-cnbM-ChwX-ihkI-8SC7qy"
device = "/dev/sdd"# Hint only

status = ["ALLOCATABLE"]
flags = []
dev_size = 57836029608# 26.932 Terabytes
pe_start = 384
pe_count = 882507# 26.932 Terabytes
}

pv2 {
id = "Q767P1-EdM3-36G7-z7Wq-lLeY-rt89-RsEX9K"
device = "/dev/sdc"# Hint only

status = ["ALLOCATABLE"]
flags = ["MISSING"]
dev_size = 57831734641# 26.93 Terabytes
pe_start = 384
pe_count = 882442# 26.93 Terabytes
}

pv3 {
id = "0cRwrB-1Gr2-U721-KbSm-26Lw-EVVN-Db5DuK"
device = "/dev/sde"# Hint only

status = ["ALLOCATABLE"]
flags = ["MISSING"]
dev_size = 73014444032# 34 Terabytes
pe_start = 384
pe_count = 1114111# 34 Terabytes
}
}

logical_volumes {

lv_var {
id = "gLn3Cb-fZyf-xOkk-dL5r-VzWH-Y2VK-TyF28b"
status = ["READ", "WRITE", "VISIBLE"]
flags = []
segment_count = 4

segment1 {
start_extent = 0
extent_count = 7999# 249.969 Gigabytes

type = "striped"
stripe_count = 1# linear

stripes = [
"pv0", 0
]
}
segment2 {
start_extent = 7999
extent_count = 882507# 26.932 Terabytes

type = "striped"
stripe_count = 1# linear

stripes = [
"pv1", 0
]
}
segment3 {
start_extent = 890506
extent_count = 882442# 26.93 Terabytes

type = "striped"
stripe_count = 1# linear

stripes = [
"pv2", 0
]
}
segment4 {
start_extent = 1772948
extent_count = 1114111# 34 Terabytes

type = "striped"
stripe_count = 1# linear

stripes = [
"pv3", 0
]
}
}
}
}
root@Microknoppix:/tmp/old/etc/lvm/backup#

After reloading the config I was able to activate the LVM.

Code:

root@Microknoppix:/dev# vgchange -ay vg_dme
  WARNING: Failed to connect to lvmetad. Falling back to device scanning.
  1 logical volume(s) in volume group "vg_dme" now active
root@Microknoppix:/dev#
root@Microknoppix:/tmp# lvscan
  WARNING: Failed to connect to lvmetad. Falling back to device scanning.
  ACTIVE            '/dev/vg_dme/lv_var' [<88.11 TiB] inherit
root@Microknoppix:/tmp#

I thought everything was great until I tried to mount it.

Code:

root@Microknoppix:/mnt# mount -v -t xfs /dev/vg_dme/lv_var /mnt/data/
mount: mount /dev/mapper/vg_dme-lv_var on /mnt/data failed: Structure needs cleaning

The above is where I'm stuck at this point. I tried running xfs_repair with both Knoppix and CentOS Live CD. Neither operation seems to have worked.

This is the output of Knoppix stating it doesn't support iNoveds v1
Code:

root@Microknoppix:/mnt/data# xfs_repair -v /dev/vg_dme/lv_var
Phase 1 - find and verify superblock...
xfs_repair: V1 inodes unsupported. Please try an older xfsprogs.

I do have hex dumps from the front of each device that I can share if that would help. Each block does show LVM data starting @ 0x3000.

Next Steps:
I'll take any chances in the dark that people think might work to mount and recover the data. I have full access to the system through the boot up of the rescue disk or Live disk.

Thanks,

VDI Engineering

Other Information that might be of use:
- We are able to mount the orginal boot drive through a 2nd virtual machine so the host O/S information is still accessable (thats how we got the LVM backup files)

Code:

root@Microknoppix:/etc/lvm/backup# blkid
/dev/cloop0: UUID="2018-05-14-12-45-04-74" LABEL="KNOPPIX_FS" TYPE="iso9660"
/dev/cloop1: UUID="2018-05-14-04-27-08-00" LABEL="KNOPPIX_ADDONS1" TYPE="iso9660"
/dev/zram0: UUID="dd3a7e63-2d6d-4899-a7db-36f129cc7531" TYPE="swap"
/dev/sr0: UUID="2018-05-15-01-26-08-00" LABEL="KNOPPIX_8" TYPE="iso9660" PTUUID="6a2a1840" PTTYPE="dos"
/dev/sda1: LABEL="/" UUID="67128ada-bb5b-48f1-9e8c-ce0e7f6f9fd2" TYPE="ext3" PARTUUID="000ec630-01"
/dev/sda2: LABEL="SWAP-sda2" TYPE="swap" PARTUUID="000ec630-02"
/dev/sdb1: UUID="WRCYu2-idtX-OxyF-6vYF-NG4K-7zof-gCZx64" TYPE="LVM2_member" PARTUUID="00026213-01"
/dev/sdc: UUID="Q767P1-EdM3-36G7-z7Wq-lLeY-rt89-RsEX9K" TYPE="LVM2_member"
/dev/sdd: UUID="tjnwOA-yrZr-Lk67-cnbM-ChwX-ihkI-8SC7qy" TYPE="LVM2_member"
/dev/sde: UUID="0cRwrB-1Gr2-U721-KbSm-26Lw-EVVN-Db5DuK" TYPE="LVM2_member"

root@Microknoppix:/etc/lvm/backup# lvmdiskscan -v
  WARNING: Failed to connect to lvmetad. Falling back to device scanning.
    Wiping cache of LVM-capable devices
  /dev/ram0  [      4.00 MiB]
  /dev/ram1  [      4.00 MiB]
  /dev/sda1  [      26.19 GiB]
  /dev/ram2  [      4.00 MiB]
  /dev/sda2  [      <5.81 GiB]
  /dev/ram3  [      4.00 MiB]
  /dev/ram4  [      4.00 MiB]
  /dev/ram5  [      4.00 MiB]
  /dev/ram6  [      4.00 MiB]
  /dev/ram7  [      4.00 MiB]
  /dev/ram8  [      4.00 MiB]
  /dev/ram9  [      4.00 MiB]
  /dev/ram10 [      4.00 MiB]
  /dev/ram11 [      4.00 MiB]
  /dev/ram12 [      4.00 MiB]
  /dev/ram13 [      4.00 MiB]
  /dev/ram14 [      4.00 MiB]
  /dev/ram15 [      4.00 MiB]
  /dev/sdb1  [    <250.00 GiB] LVM physical volume
  /dev/sdc  [      26.93 TiB] LVM physical volume
  /dev/sdd  [      26.93 TiB] LVM physical volume
  /dev/sde  [      34.00 TiB] LVM physical volume
  0 disks
  18 partitions
  3 LVM physical volume whole disks
  1 LVM physical volume

root@Microknoppix:/etc/lvm/backup# pvck -t /dev/sde
  WARNING: Failed to connect to lvmetad. Falling back to device scanning.
  TEST MODE: Metadata will NOT be updated and volumes will not be (de)activated.
  Found label on /dev/sde, sector 1, type=LVM2 001
  Found text metadata area: offset=4096, size=192512
root@Microknoppix:/etc/lvm/backup#

root@Microknoppix:/etc/lvm/backup# lvs -o +devices
  WARNING: Failed to connect to lvmetad. Falling back to device scanning.
  LV    VG    Attr      LSize  Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert Devices
  lv_var vg_dme -wi-----p- <88.11t                                                    /dev/sdb1(0)
  lv_var vg_dme -wi-----p- <88.11t                                                    /dev/sdd(0)
  lv_var vg_dme -wi-----p- <88.11t                                                    /dev/sdc(0)
  lv_var vg_dme -wi-----p- <88.11t                                                    /dev/sde(0)

root@Microknoppix:/mnt/old/etc# xfs_repair -n /dev/vg_dme/lv_var
Phase 1 - find and verify superblock...
xfs_repair: V1 inodes unsupported. Please try an older xfsprogs.
root@Microknoppix:/mnt/old/etc#


rknichols 08-12-2019 01:39 PM

The order of the disks doesn't matter. LVM assembles the array according to the UUID that is in the header of each physical volume. The header in each PV contains all the information that is in that /etc/lvm/backup or /etc/lvm/archive file (in ASCII, just not as nicely formatted), and the "id =" fields for the PVs are used to search for the devices. As stated in the file, the "device =" fields are just a hint to speed up the search.

It doesn't make a lot of sense that sdc and sde were missing when the array was being assembled, but magically reappeared for vgcfgrestore, which searches for those same UUIDs and will fail if any are not found.

Anyway, it does not appear that the problem is now with LVM, but rather with the XFS filesystem due to the drives being pulled while the filesystem was mounted. As the message from the Microknoppix xfsrepair states, you need to be using an older version that supports your V1 inodes. Does the CentOS 5 Live Disk not include xfsrepair ?

VDIEng 08-12-2019 01:50 PM

You are correct about the disks not just coming back into the set. We took the LVM config file listed below and removed the "flags = ["MISSING"]" so instead it was just "flags = []". Re-Ran the restore_cfg and that's when the missing disks showed back up. This is how we were able to re-add disks to the set.

We tried running xfs_repair from the CentOS Lice CD (v5) and it did run, but there was never any significant output on the command line. We let it run for about 5 days. We then ran the command again with the following added parms ( -v -i 10 ) after about 10 minutes, (not timed) the command line said, "Killed". I hope this added information provides some next steps.

This was the output from xfs_repair on CentOS 5 during the 5 day run period
Code:

phase 1 - find and verify superblock...
We concluded that the xfs_repair program was not taking any action (staled) by looking at the disk I/OPS in VMWare, we were showing no activity to any disk for the VM.

Thanks

rknichols 08-12-2019 02:16 PM

From what I can find, XFS was not a standard part of RHEL 5. The tools available then for it back then were perhaps not the best. Sorry about that. In any event, I can not help with XFS problems.

syg00 08-12-2019 08:54 PM

How much of that 88T was actual data ?. And if it's worth spending days on recovering, where are the backups ?. These are questions that the business should be answering, I'm not pissing on you.

What is the data they most care about - is it a proprietary format, or something "known" ?. Can they provision another 88T, preferably more, if you need to forensically scrape the data ?. Are they prepared to wait - maybe weeks ?. Can you re-construct directory structures and filenames as before if required by the software ?.
If you can't repair/mount the filesystem, there are probably no easy solutions unless you have backups.

VDIEng 08-13-2019 08:16 AM

syg00,

I completely understand what your saying and here's what I can tell you, I took nothing you said personally so no offense taken. The 88TB was a data tank for recorded videos. It contained the work of about 10 employees for the past 5 years! There was never a backup taken. The system is isolated (limited to about 1500sq ft in a single building), I know it's not an excuse but it is what it is unfortunatly. My team performed the build and installation in 2014. Another contractor who we will say was less than quailifed had O&M since turn over. I begged them to do back ups but they couldn't even be bothered to check for failed drives or validate daily operations.

There is no additional space to provision but if the LVM/XFS could be mounted I could/would attached 10TB external drives and scrape out the data one directory at a time. The files are all industry standard video formats (mp4, idx & some meta data). The system can not leave the building because of the data stored on it.

Turns out the system has been "broken" since Feb of 19 and they brought us in July of 19 to try and recover the data. Were at the point of complete (or partial) recovery or wipe the data tanks. Thats not what the customer wants but they know the O&M team failed miserably!

Any additional ideas on the XFS? Based on some previous remarks I'm going back overtoday and mounting the CentOS 5.10 disk to try and run xfs_check which predates xfs_repair. I also didn't realize you could run it against a single device (eg: /dev/sdc) I thought it was to be run against the entire VG.

rknichols 08-13-2019 10:09 AM

Quote:

Originally Posted by VDIEng (Post 6024656)
I also didn't realize you could run it against a single device (eg: /dev/sdc) I thought it was to be run against the entire VG.

Where did you hear that? You can't run xfs-repair against just part of a filesystem. If it tries to "fix" anything when run that way, it will just be doing further damage.

You have just a single ~97TB logical volume /dev/vg_dme/lv_var (also accessible as /dev/mapper/vg_dme-lv_var), and that is the "device file" for xfs-repair.

VDIEng 08-13-2019 10:12 AM

Quote:

Originally Posted by rknichols (Post 6024694)
Where did you hear that? You can't run xfs-repair against just part of a filesystem. If it tries to "fix" anything when run that way, it will just be doing further damage.

You have just a single ~97TB logical volume /dev/vg_dme/lv_var (also accessible as /dev/mapper/vg_dme-lv_var), and that is the "device file" for xfs-repair.

Thanks for letting me know. I only have done it against the entire LV todate. I came up with that information based on this thread I was reading last night.

https://forums.opensuse.org/showthre...needs-cleaning

I guess I made the assumption there was an underlaying LVM, but I guess it could also be XFS on a single drive.

Thanks for saving me some time.

rknichols 08-13-2019 10:28 AM

Quote:

Originally Posted by VDIEng (Post 6024695)
https://forums.opensuse.org/showthre...needs-cleaning

I guess I made the assumption there was an underlaying LVM, but I guess it could also be XFS on a single drive.

Indeed, there was no LVM involved in that example.

VDIEng 08-13-2019 02:53 PM

UPDATE
 
UPDATE -
Based on the older style XFS I made one final attempt. Booted into CentOS 5.10 "linux recovery". Performed a scan and loaded up the disks. Utilited the old version of cfgrestore to reload the config.

Then the magic happened and I think it was due to an extra flag.

Trying to mount the LV provided the same result, "Structure Needs Cleaning"
xfs_check said that we needed to run xfs_repair with a -L

Code:

mount -t xfs -o ro norecovery /dev/vg_dme/lv_var /mnt/data
For some reason it mounted, I'm guessing because it bypassed the log tables that were not synced and allowed for a read-only copy. I now have access to 36TB of video files and my next deliema is how to get the data out of the system. I'm currently looking at external 10TB HDDs but then I have to split the data some how. I'm also looking at consumer grade NAS that would take like x6 HDD so that I could create a single large XFS to copy the data directly over. Any thoughts would be greatly appreciated as I have never done a recovery that didn't fix on a single external device.

rknichols 08-13-2019 08:37 PM

First thing would be to have the company decide what they are going to use as a backup solution, then recover the files directly to that. Given the current scare, I suspect that having backups would now be a fairly high-priority item. Note that failure of any one of the 4 drives would make recovery of the filesystem impossible, and they would be reduced to using tools like photorec to dig files and file fragments out of what was left. Doing that for a ~97TB filesystem and figuring out what the original file names might have been would take months, if not years.

syg00 08-13-2019 09:22 PM

So you only need slightly more than 36T - hence my first question. Get the data copied - use the 10T drives if you have to. Copy it a file at a time -data are just data. Worry about getting it back together later, when you have repaired the filesystem. Get a list of the entire filesystem tree as well, along with ownership/permissions.


All times are GMT -5. The time now is 12:07 PM.