[SOLVED] One of hard disks of the logical volume failed
Linux - GeneralThis Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
I'm not sure you are going to get much help. You can try things like testdisk/photorec, but I don't think you will get much data back. That doesn't mean none, just not much, and not necessarily all of a file.
The problem is that with a linear concatenation, you lose the entire filesystem when either one fails.
The cause of such loss is due to the filesystem allocating both meta-data and data blocks scattered for opimum access. So such allocations do not/will not put all the data on one physical volume.
Had pv1 and pv2 been raid volumes (other than raid0...), the raid recovery would have preserved the data.
It's really hard to give exact instructions without knowing the content of the "physical_volumes { ... }" section of that LVM backup file, but I would start with a new disk drive (750 GB or larger), make a 550 GB partition there, use dd to copy segment1 to the new drive, zero out the rest of the partition (probably already zeros if it's a new drive). and then see what fsck can do to reconstruct that filesystem.
To do that copying of segment 1 you need to run:
Code:
dd if={device for pv0} of={your new partition} bs=1M count=$((69199*4)) skip=$((7050*4 + 1}))
For zeroing the rest of the partition (if necessary):
Code:
dd if=/dev/zero of={your new partition} bs=4194304 seek=69199
That "4194304" number is is 4 MiB extent size that pv0 appears to have from the numbers you gave.
If the pv1 drive is not totally dead, you can try to use ddrescue to recover as much data as possible rather than filling the rest of the partition with zeros. If that's the case, let me know and I can give more exact instructions.
Last edited by rknichols; 01-03-2016 at 01:43 PM.
Reason: Correct the dd parameters for copying segment 1
The cause of such loss is due to the filesystem allocating both meta-data and data blocks scattered for opimum access.
Fortunately, not as scattered as you might think. To avoid excessive seeking, the allocator (for ext2/3/4, at least) tries to put the data blocks for a file together in the same block group as that file's inode, and the inodes for files tend to be near the inode for the directory that contains them. Of course all bets are off when the block groups start to fill up (one reason for that 5% reserved space is to have some space available in each block group), but if "home" was originally just on one PV and later extended to a second, all of the old data on that first PV would still be there.
So, it looks like pv0 was on /dev/sda5 (~297851 GiB). That might not be /dev/sda in your rescue environment, so you'll want to use blkid to identify the partition unless it's obvious which disk is which. Also, I've changed the dd parameters for copying segment 1. They were wrong before since the pe_start offset is in units of 512-byte sectors, not 4 MiB extents. I'm pretty sure it's right, now. You can try this:
to be sure. The file command should pick up the identity of the filesystem. That says the start point is right, and I know the "count=$((69199*4))" is right.
There will be no indication from fsck about what files are lost since the directories they were in are probably gone too. Also, fsck just makes the filesystem metadata consistent. It has no way to check the content of files. I suppose one way to tell what files (the ones that still exist) have blocks in the missing area would be to create a dmsetup mapping with that whole region mapped to the error target. That might be something to try even without attempting fsck, but get that copying done first so that there is something to work with without risking the original data.
I'll do the copy as soon as I get a new hard disk. In the meantime could you, please, tell how to do this?
After you create a partition on the new disk, just follow the instruction I gave back in #3:
Code:
dd if={device for pv0} of={your new partition} bs=1M count=$((69199*4)) skip=$((7050*4 + 1}))
# which is probably
dd if=/dev/sda5 of=/dev/sdb1 bs=1M count=$((69199*4)) skip=$((7050*4 + 1}))
Do be sure that /dev/sda and /dev/sdb are the correct disks first. I generally do "cat /proc/partitions" and look at the output for confirmation. (The sizes there are in units of 1K blocks.)
Sorry, totally misunderstood. I did some experimenting and found that the error mapping might not be very useful. Due to the I/O errors, the filesystem can't be mounted. You can get in and poke around with debugfs, but getting any info that way would be beyond tedious. I'll think about this for a while and see if I can come up with anything useful.
First, create a file /tmp/mymap with the following content:
Code:
0 566878208 linear /dev/sdb1 0
566878208 488390656 error
That 566878208 is the number of 512-byte sectors in the 69199 4MiB extents of pv0, and 488390656 is for the 59168 extents of pv1.
Now, run
Code:
dmsetup create badhome /tmp/mymap
You now have a device /dev/mapper/badhome that will return an I/O error for any reference to sectors beyond what was mapped from /dev/sdb1.
I have great news. Today I connected the disk to another PC and it worked, then connected back to original - works as well. I recreated the logical volume setup to original state and tried mounting /home. I got an error:
Code:
[ 2404.541803] EXT4-fs (dm-4): bad geometry: block count 131908608 exceeds size of device (70859776 blocks)
My corrent logical volume setup:
Code:
root@lieta:/etc/lvm/archive# lvdisplay --units b
...
--- Logical volume ---
LV Path /dev/debian-vg/home
LV Name home
VG Name debian-vg
LV UUID xYzC2U-xLAo-PfTs-5mjA-EwXj-d2c1-gWDhcN
LV Write Access read/write
LV Creation host, time debian, 2015-06-22 11:47:59 +0300
LV Status available
# open 0
LV Size 540297658368 B
Current LE 128817
Segments 2
Allocation inherit
Read ahead sectors auto
- currently set to 256
Block device 254:4
root@lieta:/etc/lvm/archive# ls -l /dev/debian-vg/home
lrwxrwxrwx 1 root root 7 jan 5 19:34 /dev/debian-vg/home -> ../dm-4
root@lieta:/etc/lvm/archive# ls -l /dev/dm-4
540297658368/(1024*4)==131908608. Why it doesn't mount?
I had high hopes that testdisk file recovery would help find what files were corrupted, but when asked to recover all files it recovers a mish-mash of intact files, deleted files, and partially recovered files. You can identify the partially recovered files by the size difference. So, what I suggest is:
Copy the partial filesystem to a new partition and pad with zeros to the original size (at least) as previously described.
Run [fsck] on that new partition to get a sane filesystem there.
Create a mapped device from that partition with the padded region mapped to the error target as previously described.
Make a recovery directory somewhere with enough space to hold the recovered files.
Run testdisk on the mapped device, select "Unpartitioned device", and go into "Advanced file recovery".
Type "a" to select all files, then "C" (upper case) to copy selected files. Select your recovery directory as the target. Go have lunch while it works.
After exiting testdisk, mount the new partition (the whole thing -- not the error-mapped version) read-only on /mnt/tmp. Then you can run
Code:
cd {your recovery directory}
find . -type f -exec test -f "/mnt/tmp/{}" \; -exec cmp {} "/mnt/tmp/{}" \;
That should (a) skip over any deleted files that were recovered (files that don't exist in /mnt/tmp) and (b) cause an "EOF on ..." failure message from cmp for any files that were partially recovered.
As I said before, there will be no way to tell what files were totally lost. Their names exist only in the missing part of the filesystem.
That's the best I can come up with. I've already spent too much time on this, but it's been quite educational for me.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.