LinuxQuestions.org
Visit Jeremy's Blog.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - General
User Name
Password
Linux - General This Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.

Notices


Reply
  Search this Thread
Old 01-03-2016, 08:57 AM   #1
Lieta
LQ Newbie
 
Registered: Nov 2013
Posts: 24

Rep: Reputation: Disabled
One of hard disks of the logical volume failed


Hi.
One of two hard disks that a logical volume was on failed.
Code:
                home {
                        id = "xYzC2U-xLAo-PfTs-5mjA-EwXj-d2c1-gWDhcN"
                        status = ["READ", "WRITE", "VISIBLE"]
                        flags = []
                        creation_host = "debian"
                        creation_time = 1434962879      # 2015-06-22 11:47:59 +0300
                        segment_count = 2

                        segment1 {
                                start_extent = 0
                                extent_count = 69199    # 270,309 Gigabytes

                                type = "striped"
                                stripe_count = 1        # linear

                                stripes = [
                                        "pv0", 7050
                                ]
                        }
                        segment2 {
                                start_extent = 69199
                                extent_count = 59618    # 232,883 Gigabytes

                                type = "striped"
                                stripe_count = 1        # linear

                                stripes = [
                                        "pv1", 0
                                ]
                        }
                }
Here "home" is a logical volume and "pv1" is a physical volume that failed. How to proceed to recover as much data as possible?
 
Old 01-03-2016, 09:18 AM   #2
jpollard
Senior Member
 
Registered: Dec 2012
Location: Washington DC area
Distribution: Fedora, CentOS, Slackware
Posts: 4,912

Rep: Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513
I'm not sure you are going to get much help. You can try things like testdisk/photorec, but I don't think you will get much data back. That doesn't mean none, just not much, and not necessarily all of a file.

The problem is that with a linear concatenation, you lose the entire filesystem when either one fails.

The cause of such loss is due to the filesystem allocating both meta-data and data blocks scattered for opimum access. So such allocations do not/will not put all the data on one physical volume.

Had pv1 and pv2 been raid volumes (other than raid0...), the raid recovery would have preserved the data.
 
1 members found this post helpful.
Old 01-03-2016, 09:42 AM   #3
rknichols
Senior Member
 
Registered: Aug 2009
Distribution: Rocky Linux
Posts: 4,774

Rep: Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211
Be prepared to cry a lot.

It's really hard to give exact instructions without knowing the content of the "physical_volumes { ... }" section of that LVM backup file, but I would start with a new disk drive (750 GB or larger), make a 550 GB partition there, use dd to copy segment1 to the new drive, zero out the rest of the partition (probably already zeros if it's a new drive). and then see what fsck can do to reconstruct that filesystem.

To do that copying of segment 1 you need to run:
Code:
dd if={device for pv0} of={your new partition} bs=1M count=$((69199*4)) skip=$((7050*4 + 1}))
For zeroing the rest of the partition (if necessary):
Code:
dd if=/dev/zero of={your new partition} bs=4194304 seek=69199
That "4194304" number is is 4 MiB extent size that pv0 appears to have from the numbers you gave.

If the pv1 drive is not totally dead, you can try to use ddrescue to recover as much data as possible rather than filling the rest of the partition with zeros. If that's the case, let me know and I can give more exact instructions.

Last edited by rknichols; 01-03-2016 at 01:43 PM. Reason: Correct the dd parameters for copying segment 1
 
2 members found this post helpful.
Old 01-03-2016, 10:08 AM   #4
rknichols
Senior Member
 
Registered: Aug 2009
Distribution: Rocky Linux
Posts: 4,774

Rep: Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211
Quote:
Originally Posted by jpollard View Post
The cause of such loss is due to the filesystem allocating both meta-data and data blocks scattered for opimum access.
Fortunately, not as scattered as you might think. To avoid excessive seeking, the allocator (for ext2/3/4, at least) tries to put the data blocks for a file together in the same block group as that file's inode, and the inodes for files tend to be near the inode for the directory that contains them. Of course all bets are off when the block groups start to fill up (one reason for that 5% reserved space is to have some space available in each block group), but if "home" was originally just on one PV and later extended to a second, all of the old data on that first PV would still be there.
 
Old 01-03-2016, 11:18 AM   #5
Lieta
LQ Newbie
 
Registered: Nov 2013
Posts: 24

Original Poster
Rep: Reputation: Disabled
Thanks a lot, rknichols. pv1 is totally dead, not even detected. Here's the "physical_volumes { ... }" you requested.
Code:
        physical_volumes {

                pv0 {
                        id = "aycqo0-UjXU-eEma-UHny-JCTh-Qtmn-sSm3gO"
                        device = "/dev/sda5"    # Hint only

                        status = ["ALLOCATABLE"]
                        flags = []
                        dev_size = 624637952    # 297,851 Gigabytes
                        pe_start = 2048
                        pe_count = 76249        # 297,848 Gigabytes
                }

                pv1 {
                        id = "wLDgiA-zdQT-TVQN-miYn-4Zjo-mUge-bj6F0f"
                        device = "unknown device"       # Hint only

                        status = ["ALLOCATABLE"]
                        flags = ["MISSING"]
                        dev_size = 488397168    # 232,886 Gigabytes
                        pe_start = 2048
                        pe_count = 59618        # 232,883 Gigabytes
                }
        }

Last edited by Lieta; 01-03-2016 at 11:31 AM.
 
Old 01-03-2016, 12:15 PM   #6
Lieta
LQ Newbie
 
Registered: Nov 2013
Posts: 24

Original Poster
Rep: Reputation: Disabled
Important is also to know what files have been lost or have corrupt contents. Will fsck show it?
 
Old 01-03-2016, 02:03 PM   #7
rknichols
Senior Member
 
Registered: Aug 2009
Distribution: Rocky Linux
Posts: 4,774

Rep: Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211
So, it looks like pv0 was on /dev/sda5 (~297851 GiB). That might not be /dev/sda in your rescue environment, so you'll want to use blkid to identify the partition unless it's obvious which disk is which. Also, I've changed the dd parameters for copying segment 1. They were wrong before since the pe_start offset is in units of 512-byte sectors, not 4 MiB extents. I'm pretty sure it's right, now. You can try this:
Code:
dd if=/dev/sda5 bs=1M count=2 skip=$((7050*4 + 1})) | file -s -
to be sure. The file command should pick up the identity of the filesystem. That says the start point is right, and I know the "count=$((69199*4))" is right.

There will be no indication from fsck about what files are lost since the directories they were in are probably gone too. Also, fsck just makes the filesystem metadata consistent. It has no way to check the content of files. I suppose one way to tell what files (the ones that still exist) have blocks in the missing area would be to create a dmsetup mapping with that whole region mapped to the error target. That might be something to try even without attempting fsck, but get that copying done first so that there is something to work with without risking the original data.
 
1 members found this post helpful.
Old 01-03-2016, 03:00 PM   #8
Lieta
LQ Newbie
 
Registered: Nov 2013
Posts: 24

Original Poster
Rep: Reputation: Disabled
Disk is /dev/sda, since it's the only one left now.
Code:
$ sudo dd if=/dev/sda5 bs=1M count=2 skip=$((7050*4 + 1)) | file -s -
/dev/stdin: Linux rev 1.0 ext4 filesystem data, UUID=7d25087b-a730-4621-a49d-360380911cd4 (errors) (extents) (large files) (huge files)
Is this ok?
 
Old 01-03-2016, 03:05 PM   #9
rknichols
Senior Member
 
Registered: Aug 2009
Distribution: Rocky Linux
Posts: 4,774

Rep: Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211
Excellent!! Proceed with the copying.
 
Old 01-04-2016, 02:56 AM   #10
Lieta
LQ Newbie
 
Registered: Nov 2013
Posts: 24

Original Poster
Rep: Reputation: Disabled
I'll do the copy as soon as I get a new hard disk. In the meantime could you, please, tell how to do this?
Quote:
Originally Posted by rknichols View Post
create a dmsetup mapping with that whole region mapped to the error target
I won't do this until I make a copy, of course.
 
Old 01-04-2016, 08:41 AM   #11
rknichols
Senior Member
 
Registered: Aug 2009
Distribution: Rocky Linux
Posts: 4,774

Rep: Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211
Quote:
Originally Posted by Lieta View Post
I'll do the copy as soon as I get a new hard disk. In the meantime could you, please, tell how to do this?
After you create a partition on the new disk, just follow the instruction I gave back in #3:
Code:
dd if={device for pv0} of={your new partition} bs=1M count=$((69199*4)) skip=$((7050*4 + 1}))
# which is probably
dd if=/dev/sda5 of=/dev/sdb1 bs=1M count=$((69199*4)) skip=$((7050*4 + 1}))
Do be sure that /dev/sda and /dev/sdb are the correct disks first. I generally do "cat /proc/partitions" and look at the output for confirmation. (The sizes there are in units of 1K blocks.)
 
Old 01-04-2016, 09:25 AM   #12
Lieta
LQ Newbie
 
Registered: Nov 2013
Posts: 24

Original Poster
Rep: Reputation: Disabled
This is clear, I mean how to do dmsetup mapping to identify which files are in the "lost" area?
 
Old 01-04-2016, 10:54 AM   #13
rknichols
Senior Member
 
Registered: Aug 2009
Distribution: Rocky Linux
Posts: 4,774

Rep: Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211
Sorry, totally misunderstood. I did some experimenting and found that the error mapping might not be very useful. Due to the I/O errors, the filesystem can't be mounted. You can get in and poke around with debugfs, but getting any info that way would be beyond tedious. I'll think about this for a while and see if I can come up with anything useful.

First, create a file /tmp/mymap with the following content:
Code:
0 566878208 linear /dev/sdb1 0
566878208 488390656 error
That 566878208 is the number of 512-byte sectors in the 69199 4MiB extents of pv0, and 488390656 is for the 59168 extents of pv1.

Now, run
Code:
dmsetup create badhome /tmp/mymap
You now have a device /dev/mapper/badhome that will return an I/O error for any reference to sectors beyond what was mapped from /dev/sdb1.
 
2 members found this post helpful.
Old 01-05-2016, 12:26 PM   #14
Lieta
LQ Newbie
 
Registered: Nov 2013
Posts: 24

Original Poster
Rep: Reputation: Disabled
I have great news. Today I connected the disk to another PC and it worked, then connected back to original - works as well. I recreated the logical volume setup to original state and tried mounting /home. I got an error:
Code:
[ 2404.541803] EXT4-fs (dm-4): bad geometry: block count 131908608 exceeds size of device (70859776 blocks)
My corrent logical volume setup:
Code:
root@lieta:/etc/lvm/archive# lvdisplay --units b
...
  --- Logical volume ---
  LV Path                /dev/debian-vg/home
  LV Name                home
  VG Name                debian-vg
  LV UUID                xYzC2U-xLAo-PfTs-5mjA-EwXj-d2c1-gWDhcN
  LV Write Access        read/write
  LV Creation host, time debian, 2015-06-22 11:47:59 +0300
  LV Status              available
  # open                 0
  LV Size                540297658368 B
  Current LE             128817
  Segments               2
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           254:4

root@lieta:/etc/lvm/archive# ls -l /dev/debian-vg/home
lrwxrwxrwx 1 root root 7 jan  5 19:34 /dev/debian-vg/home -> ../dm-4
root@lieta:/etc/lvm/archive# ls -l /dev/dm-4
540297658368/(1024*4)==131908608. Why it doesn't mount?

EDIT: After reboot everything is fine.

Last edited by Lieta; 01-05-2016 at 12:51 PM.
 
Old 01-05-2016, 12:40 PM   #15
rknichols
Senior Member
 
Registered: Aug 2009
Distribution: Rocky Linux
Posts: 4,774

Rep: Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211
I had high hopes that testdisk file recovery would help find what files were corrupted, but when asked to recover all files it recovers a mish-mash of intact files, deleted files, and partially recovered files. You can identify the partially recovered files by the size difference. So, what I suggest is:
  1. Copy the partial filesystem to a new partition and pad with zeros to the original size (at least) as previously described.
  2. Run [fsck] on that new partition to get a sane filesystem there.
  3. Create a mapped device from that partition with the padded region mapped to the error target as previously described.
  4. Make a recovery directory somewhere with enough space to hold the recovered files.
  5. Run testdisk on the mapped device, select "Unpartitioned device", and go into "Advanced file recovery".
  6. Type "a" to select all files, then "C" (upper case) to copy selected files. Select your recovery directory as the target. Go have lunch while it works.
After exiting testdisk, mount the new partition (the whole thing -- not the error-mapped version) read-only on /mnt/tmp. Then you can run
Code:
cd {your recovery directory}
find . -type f -exec test -f "/mnt/tmp/{}" \; -exec cmp {} "/mnt/tmp/{}" \;
That should (a) skip over any deleted files that were recovered (files that don't exist in /mnt/tmp) and (b) cause an "EOF on ..." failure message from cmp for any files that were partially recovered.

As I said before, there will be no way to tell what files were totally lost. Their names exist only in the missing part of the filesystem.

That's the best I can come up with. I've already spent too much time on this, but it's been quite educational for me.
 
1 members found this post helpful.
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Disk failed & replaced; how to rebuild broken logical volume? JMCraig Linux - Hardware 1 08-21-2009 11:19 AM
Guest OS installation failed on a logical volume in CentOS 5.1 64bit! jimmyjiang Red Hat 1 04-11-2008 03:35 PM
Logical Volume and New Hard Drive Hambone_20003 Linux - Hardware 2 02-16-2007 08:27 PM
Increasing a Logical Volume over a new hard-drive - Possible? rizhun Linux - Hardware 2 04-01-2006 12:56 PM
Setting up logical volume management... [FAILED] userini Linux - Software 0 06-28-2004 12:56 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - General

All times are GMT -5. The time now is 02:08 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration