LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - General (https://www.linuxquestions.org/questions/linux-general-1/)
-   -   Recover data from full image disk (https://www.linuxquestions.org/questions/linux-general-1/recover-data-from-full-image-disk-4175522277/)

rknichols 10-21-2014 07:12 PM

Quote:

Originally Posted by Nogitsune (Post 5257370)
-- edit --
If you haven't mounted the XFS yet with the changed logical volume alignment, please don't. Stop what you're doing now.

Sorry, I may be totally off with everything I wrote here, and indeed it's possible that mounting it that way might restore everything. But if by chance what I'm thinking, actually happened.. then it's probably also possible that the mount would totally trash the partition. So I'm encouraging a bit more investigation to this, and if at all possible, a full backup.

Agreed. I see it as a huge fault of xfs that there is no way to check a filesystem without writing to it and possibly making a mess (i.e., no equivalent for "fsck -n").
Quote:

On the other hand if you mess anything up - wrong chunk size, wrong order of disks.. and let it slip to running the resync, it's all gone.
That point has already been passed. The system has been running with the RAID up and the VG structure visible within it.
Quote:

What I still don't understand then is, why the LVM alignment would be messed up. If there's extra padding between LVM start and start of XFS, and we assume this XFS header is the original one.. wouldn't that mean that LVM blocks are starting from a spot earlier than they used to?
My theory is that some change to the RAID options made the RAID header 8192 bytes smaller than it had been. I hadn't thought about RAID stripe misalignment. Looking again at that RAID superblock, it appears to have a decidedly non-default chunk size of 512K (0x400 = 1024 512-byte sectors).[Edit: 512K is the default for version 1.2 metadata] The only thing we have to go on at this point is that the xfs superblocks all apear at consistent locations, but since every one of those offsets is an exact multiple of the RAID chunk size, that really doesn't confirm anything.

Here is a suggestion. Go ahead and adjust the LVM pe_start value, but don't try to mount or check anything. Now use dd to copy an image of just one of the LVs to another device. Start with one of the smaller ones, like prod_portables or lab_templates (just 100GB each). Then try to mount and check that new image. That will leave the original data safe, but tell you what would have happened had you tried that on the original. That's a lot quicker than imaging the whole 3.6TB VG.

Nogitsune 10-21-2014 08:21 PM

Quote:

Originally Posted by rknichols (Post 5257415)
That point has already been passed. The system has been running with the RAID up and the VG structure visible within it.

That's true to a point, but I'm optimistically paranoid. :scratch: Say for example that the original chunk size was 256k instead of 512k. Mirror is below stripe, so it would not be directly affected. Stripes would be missaligned, the 256 chunks instead of 1-2-3-4-5-6-7-8 in 1-3-2-4-5-7-6-8 pattern. 1,4,5,8 would still be where you expect to find them (although further shuffled by the 8192 defect). As long as you avoid writing on the disk, the damage at this point would still probably not be catastrophic. Or if you striped the mirrors the wrong way, you'd get 3-4-1-2-7-8-5-6. Since the XFS appeared close to where it was supposed to, this doesn't seem feasible. Mirroring the wrong volumes would probably already either have been a lost cause, or the resync would have miraculously copied the data to "faulty" mirrors and essentially restored the array. So I guess the only relevant case might be the possibility that the current chunk size was different from the original one.

Quote:

My theory is that some change to the RAID options made the RAID header 8192 bytes smaller than it had been. I hadn't thought about RAID stripe misalignment. Looking again at that RAID superblock, it appears to have a decidedly non-default chunk size of 512K (0x400 = 1024 512-byte sectors). The only thing we have to go on at this point is that the xfs superblocks all apear at consistent locations, but since every one of those offsets is an exact multiple of the RAID chunk size, that really doesn't confirm anything.

Here is a suggestion. Go ahead and adjust the LVM pe_start value, but don't try to mount or check anything. Now use dd to copy an image of just one of the LVs to another device. Start with one of the smaller ones, like prod_portables or lab_templates (just 100GB each). Then try to mount and check that new image. That will leave the original data safe, but tell you what would have happened had you tried that on the original. That's a lot quicker than imaging the whole 3.6TB VG.
dd image sounds like a nice, nondestructive way to check things out and 100GB should be plenty to see the results. If it fails though, and shifting the whole raid by 8192 becomes relevant (basically by creating a single partition on the disk at right location and using it as the container for raid), then I'd go with disk clones. Cloning just one disk of each mirror and creating raid with missing disks would be sufficient. Time consuming, I agree - it would likely take around 8-10 hours to run the dd.

littleball 10-22-2014 08:51 AM

Hello guys,

From now on, my boss is into this too :) (I sent him the url link of this post, he read it and agree with all the comments). We haven,t made anything yet, since we,re talking about a production server, on working days, it needs to be up and running..except weekends. For the moment, we,re thinking to recreate some of your suggestions but on some dd image like rknichols suggest...is the safest way, is a task that,s gonna take a few days since we cannot overload too much the server on working hours (and we know how dd eats CPU). Probably a few of you already want to know how all this is going to end (me 2), I,ll keep you update with all the steps we,re going to go through on the following days.

XFS wiki page, indicates that is possible to mount an XFS partition on read-only and use XFS tools to try to repair that, but for some reason, I am unable to mount it read only with XFS, so maybe the wiki was talking about an very old version of XFS, or this XFS is compile with some attributes or parameters that indicates that read-only is not allowed, this sucks since other filesystems do allow the system to be mounted read-only, I imagine if root partition were XFS and something nasty happens you can,t linux single your system?. It doesn,t matter now the real cause, guess we need to work on what we have.

Nogitsune

Unfortunately, I can,t tell if the current chunk size of the disk on raid, are the same as the original. But I can tell you, that like 2 weeks before this raid got "mysteriously" damage, one of the 4 disks failed, so my co-worker remove the failed disk and let the raid working with 3 disks and a few days later, my co-worker insert a new disk into this raid. The raid was working fine with the 4 disks, and suddenly one morning there were no raid, no LVM, nothing..We still don,t know what happened or what cause this.

Quote:

are you absolutely certain that the disks are put into the current raid in exactly same order and configuration they were previously (which disks are mirrored with which ones, which order the mirrors are striped in)?
1 disk did failed, a few days before the damage, my co-worker replace it and the raid was up and running fine for like 10 days, them suddenly damage appears. I suspect the damaged came from that day, even if the raid was up and running fine with the 4 disks after replacement, there have to be something, misconfiguration?, the system not able to resync and mirror data on the disks? I don,t know honestly.

rknichols 10-22-2014 10:37 AM

Quote:

Originally Posted by littleball (Post 5257675)
XFS wiki page, indicates that is possible to mount an XFS partition on read-only and use XFS tools to try to repair that, but for some reason, I am unable to mount it read only with XFS, ...

I found that xfs_repair does have a "-n" option that will allow it to run on a read-only mapping of the device. I gave it a try on a deliberately broken xfs filesystem (I disconnected the drive while the filesystem was still mounted), and it avoids the fatal error of insisting that the log needs to be replayed. You could give that a try, but for safety only on a "losetup -r" read-only mapping of the device. Problem is, I don't know what sort of errors to expect on an xfs filesystem that is just in a slightly inconsistent state (because of the crash) vs. one that is totally scrambled.

Nogitsune 10-22-2014 11:08 AM

I can't really say how your raid implementation handles the spare disk. If it were the kind of software raid I use myself, I'd have to apply the mdadm command to add the spare to the array, and then the reconstruction would do it's magic and all would be fine (I use raid on disk partitions so I'd first need to partition the new disk and then add the partition to raid instead of whole disk). However in your case it's possible the whole thing worked automatically, or maybe the co-worker in question did the needed commands to add it. I suppose it's also possible that the disk was never actually added to the array.. and then 10 days later another disk from the same mirror fell off from the raid, and caused the whole thing to fail. I couldn't really say.

Either way it seems right now the raid is up and running, and hopefully there is no severe corruption on data. What I'm mostly worried is the possibility of the following scenario:

1) for one reason or the other, the raid failed, and wouldn't reassemble properly. to restore it, decision was to do something akin to 'mdadm --create' - basically to recreate the whole raid system from original disks, writing the new header etc. If this is done exactly right, then the raid will be restored, and you'll be able to access your original data with no further losses. This is basically what I did for my own failed raid6, and at the end of day I got everything important restored. But..

2) something went wrong. What I feel is most likely is that, as rknichols pointed out, the raid is for some reason missaligned by 8k. And I'm worried about what this does to the original data. Striped raid on two disks works by writing the 'chunk-size' amount of data (512k in this case) alternating between disk 0 and disk 1. So for each 1M of data, first half is written to disk 0 and second half to disk 1. Now, ignoring the raid header, assume that the original raid was created aligned on each disk on position that starts from 8k. Chunks would be written at locations (0+8k) = 8k, (512+8k) = 520k, (1024+8k) = 1032k and so on. To read a long strip of data, you'd assemble it from disk0:8k->520k + disk1: 8k->520k + disk0:520k->1032k + disk1:520k->1032k and so forth. Now if the raid was recreated, but without that 8k chunk at the start of the disk (as it seems), then the original data is still placed on those same chunks (8-520,520-1032 and so on), but the new raid system will think that the data is instead on chunks 0-512, 512-1024 and so forth.

3) lvm was now recreated on top of this missaligned raid. This is where we would be now. For the most part the data would seem to be 8k missaligned, and shifting the logical volumes by 8k would cause this majority of data to correct itself. However, because of the error in the raid beneath the lvm error, the 8k chunk within each 512k of data would still be shuffled. You might be able to mount the system now since the XFS header would be in correct place (originally I believe this 8k missalignment is the reason you could not mount the system, not even in read-only mode). If you now mounted this 8k shifted system, and ran file check on it, then I believe one of two things would happen. either the repair would determine that it can't make sense of the data, and it would plain out fail. Or it would attempt to repair the structure, and probably corrupt the whole partition.

How could this have happened? This I don't know for certain. What rknichols suggested was that something changed in raid settings that caused the header to be 8k shorter than it used to be. I don't know enough about raid headers to say anything about that. I don't know if their size changes, and what would cause it. Maybe different metadata version? Change between 1.0 and 1.2 or something? I don't know. What I was suspecting myself, was that the originally the disks were partitioned, with single partition that starts from 8k point on the disk.. and then the raid was created on these partitions (e.g. /dev/sdc1, /dev/sdd1 and so forth). Currently the raid is made from the whole disks (/dev/sdc, /dev/sdd and so forth), where as the other raid array seems to be made from partitions (/dev/sda1, /dev/sdb1) so this kind of error would seem feasible - IF the raid was recreated from scratch, which like rknichols pointed out, would seem likely based on the timestamps (assuming the year of the server was originally set to 2013 instead of 2014 by mistake).

At this point I can't say anything for certain - but this is exactly why I'm suggesting to hold from doing ANY kind of writes on those disks until you know exactly what's going on there. Using dd to only read from the disks, and then doing whatever tests with those images, seems like the safest way at this point.

-- edit --
one more thing - assuming a disk failed and was replaced properly.. and then few days later the whole raid failed.. there's a possibility that you're dealing with something other than regular disk failures. Faulty controller, bad memory chips, failing power source, irregular/spiking electricity, or some other issue like that. Something that keeps popping the disks out. Again, difficult to pinpoint and verify, but something to keep an eye out for at least.

littleball 10-22-2014 11:39 AM

I have use xfs_check and xfs_repair with the -n a few days ago, not with the LV mounted of course, but the results are negative :( can,t find a valid superblock anywhere, which is consistent from what you said, if the filesystem inside the LV is misaligned, any temp to scan would be useless since is searching in some place that doesn,t belong probably to XFS. I did try before to mount a dd image of one of the corrupted LV as read-only to get my data back at least....I wasn,t able, I don,t remember the exact error, but I wasn,t able.

I though that inconsistent state of a disk, was metadata that was in buffer but wasn,t yet fully write to disk or were in proccess of writing when suddenly power failure or crash happens, if this is true (and correct me if I,m wrong) filesystem still can recover from journal. But in this particular case (of the corrupted LV) since everything is misaligned, and everything is scattered everywhere, is there a chance XFS tools can recover itself having the metadata this way?.

I have an Slackware server with 1 free partition i,m gonna intentionally create a XFS partition, and change the offset of start and end using sfdisk (trying to make a similar escenario, like the corrupted LV). after, going to try to mount it read-only and see if xfs allows me. :) will share results after...still haven,t done anything on the production server, but can make some test on other servers before a final task is made on the production one.

Nogitsune 10-22-2014 12:08 PM

Quote:

Originally Posted by littleball (Post 5257772)
I have use xfs_check and xfs_repair with the -n a few days ago, not with the LV mounted of course, but the results are negative :( can,t find a valid superblock anywhere, which is consistent from what you said, if the filesystem inside the LV is misaligned, any temp to scan would be useless since is searching in some place that doesn,t belong probably to XFS. I did try before to mount a dd image of one of the corrupted LV as read-only to get my data back at least....I wasn,t able, I don,t remember the exact error, but I wasn,t able.

I though that inconsistent state of a disk, was metadata that was in buffer but wasn,t yet fully write to disk or were in proccess of writing when suddenly power failure or crash happens, if this is true (and correct me if I,m wrong) filesystem still can recover from journal. But in this particular case (of the corrupted LV) since everything is misaligned, and everything is scattered everywhere, is there a chance XFS tools can recover itself having the metadata this way?.

I have an Slackware server with 1 free partition i,m gonna intentionally create a XFS partition, and change the offset of start and end using sfdisk (trying to make a similar escenario, like the corrupted LV). after, going to try to mount it read-only and see if xfs allows me. :) will share results after...still haven,t done anything on the production server, but can make some test on other servers before a final task is made on the production one.

You can't create that scenario without creating a raid, because the real issue isn't the 8k shift in data but the possibility that the raid stripes themself may be aligned wrong. and even if you recreated that kind of scenario with another raid, it would be pointless, because the real question is whether that has happened or not. If it has, and if there hasn't been any real write activity to the disk (aside from recreating the LVM metadata) then to the best of my knowledge, it should be possible to salvage the data, maybe all of it. But you do have to first know exactly what went wrong and how. If you attempt the salvage with wrong settings, it's possible to corrupt some or all of the data.

What rknichols suggested with shifting the pe alignment and then dd'ing a small 100G partition into file.. and then mounting that file with loop device, sounds like a good test to try first. If the raid stripes are not missaligned, then all the data might magically just appear, and you could copy it out of that loopback without problems. if the raid stripes are wrong, then you might still be able to mount the loop (because it will find the XFS header), but the data will be either partially or entirely corrupted, and xfs repair will probably be like taking a blender to unhatched chick - you can't put it back together afterwards.

rknichols 10-22-2014 01:53 PM

Quote:

Originally Posted by rknichols (Post 5257415)
Looking again at that RAID superblock, it appears to have a decidedly non-default chunk size of 512K (0x400 = 1024 512-byte sectors).

I found further information that indicates that for metadata version 1.2 the default chunk size is 512K. I've edited the original post to reflect that.

rknichols 10-22-2014 02:09 PM

Quote:

Originally Posted by littleball (Post 5257772)
I have use xfs_check and xfs_repair with the -n a few days ago, not with the LV mounted of course, but the results are negative :( can,t find a valid superblock anywhere, which is consistent from what you said, if the filesystem inside the LV is misaligned, any temp to scan would be useless since is searching in some place that doesn,t belong probably to XFS. I did try before to mount a dd image of one of the corrupted LV as read-only to get my data back at least....I wasn,t able, I don,t remember the exact error, but I wasn,t able.

I was referring back to your attempt in #16, where you set up a read-only loop device with offset 8192 and got failures because xfs_repair insisted that the log needed to be replayed. If you include the "-n" option, it won't do that. You will still get the "attempt to access beyond end of device" error, but might get some indication about whether the rest of the filesystem looks at all sane.

It's not as good a test as copying a 100GB LV for testing, but as you said, that's hitting the server pretty hard. (It's not really the I/O that's the problem, but AFAIK there is no way to limit the amount of buffer cache that gets used for that operation, and that forces a lot of useful stuff for other processes out of the cache.)

Nogitsune 10-22-2014 02:41 PM

Quote:

Originally Posted by rknichols (Post 5257855)
that's hitting the server pretty hard. (It's not really the I/O that's the problem, but AFAIK there is no way to limit the amount of buffer cache that gets used for that operation, and that forces a lot of useful stuff for other processes out of the cache.)

Maybe with
Code:

dd bs=512k iflag=direct oflag=direct if=/dev/path/to/LV of=/path/to/LV.img
I'm not sure what other performance issues using direct I/O in live server environment might cause though. Overall running it after business hours would likely be easier, and 100G shouldn't take terribly long time to copy. If you have to draw a full disk image, it'll be significantly more difficult situation.

If it's server you don't want to interrupt for installing more disks and such, and if you have a fast enough internal network (1G at least), then you could draw the image to another linux server through network using nc (netcat), sometime when the network is free - weekend maybe.

littleball 10-22-2014 04:02 PM

Quote:

Originally Posted by Nogitsune (Post 5257878)
Maybe with
Code:

dd bs=512k iflag=direct oflag=direct if=/dev/path/to/LV of=/path/to/LV.img

I still have 2 hours left, before working hours finish. When I get home later tonight, I,m gonna login through vpn to the company and try out the dd with those flags (I didn,t know it was possible to limit how much buffer cache I/O operations can request) - I wish I can find something similar to limit firefox since this one loves to cached a lot.

Will keep updated, in a few more hours.

Nogitsune 10-22-2014 04:54 PM

Quote:

Originally Posted by littleball (Post 5257919)
I still have 2 hours left, before working hours finish. When I get home later tonight, I,m gonna login through vpn to the company and try out the dd with those flags (I didn,t know it was possible to limit how much buffer cache I/O operations can request) - I wish I can find something similar to limit firefox since this one loves to cached a lot.

Will keep updated, in a few more hours.

Those flags should make dd use direct I/O operations, so it shouldn't cache anything, period. Consequently it might reserve I/O time more aggressively, thus I don't know how much impact it would have on performance of actual disk operations. In any case, I expect with decently powerful server you should see data speeds around 100M/s so 100G should take roughly 1000 seconds, or around 15 minutes. If it's outside business hours, I don't anticipate much problems with that.

littleball 10-22-2014 06:47 PM

Quote:

Originally Posted by rknichols (Post 5256761)
OK, the title of this thread mentions "full image disk." Do you indeed have backup images of these drives? I'm really hoping the answer is, "Yes," because I have to play with fire a bit here to proceed.

Make two copies of that /etc/lvm/backup/data file. I'll call them data.bak1 and data.bak2. A simple edit can effectively shift all the LVs by 8192 bytes. If that is the only problem, everything will come back like magic. In data.bak2, change the line
Code:

                        pe_start = 2048
to
Code:

                        pe_start = 10240
Then, say a prayer and run
Code:

vgcfgrestore -v -f data.bak2
and see what happens. You should then be able to run
Code:

blkid /dev/mapper/data-*
and see your filesystems. Now, it would be nice if xfs allowed you to check them without writing to them, but it looks like the only choice is to try to mount one read/write (that's why I hope you have a backup).

The LVM change can be easily undone by
Code:

vgcfgrestore -v -f data.bak1
Whatever the xfs mount attempt does to the contents might not be so easily undone.

This is what I,ve done *am I missing something, that it doesn,t work:

cp /etc/lvm/backup/data /etc/lvm/backup/data1
Edit data1 file:

pe_start = 10240
Code:

[root@storage-batch backup]# vgcfgrestore -v -f data1 data
  Restored volume group data
[root@storage-batch backup]# blkid /dev/mapper/data-*
/dev/mapper/data-lab_templates: PTTYPE="dos"
/dev/mapper/data-lab_vmimages: PTTYPE="dos"
/dev/mapper/data-prod_corporativos: UUID="984ad8ae-449c-4ca3-b5c3-522413edde24" TYPE="ext4"
/dev/mapper/data-prod_vmimages--batch: UUID="c27b4a05-5dd5-4d04-92ce-10483c354238" TYPE="ext4"

So I change back everything, using the original lvm data file:

Code:

[root@storage-batch backup]# vgcfgrestore -v -f data data
  Restored volume group data

I,m missing something here in the command line, if I do:

vgcfgrestore -v -f data1

I get a message from stdout:

[root@storage-batch backup]# vgcfgrestore -v -f data1
Please specify a *single* volume group to restore.

Code:

[root@storage-batch backup]# vgcfgrestore -l data
                                               
  File:        /etc/lvm/archive/data_00686-442823610.vg
  Couldn't find device with uuid 6jurGs-cu74-U1lw-abrw-vay7-qSyu-wasbhD.
  VG name:      data                                                   
  Description:  Created *before* executing '/usr/sbin/lvcreate -l 100%FREE -s -n prod_corporativos-snap /dev/data/prod_corporativos'
  Backup Time:  Tue Sep 23 03:41:59 2014                                                                                           

 
  File:        /etc/lvm/archive/data_00687-1255820102.vg
  VG name:      data                                   
  Description:  Created *before* executing '/usr/sbin/lvremove -f data/prod_corporativos-snap'
  Backup Time:  Tue Sep 23 03:42:52 2014                                                     

 
  File:        /etc/lvm/archive/data_00688-677008904.vg
  VG name:      data                                   
  Description:  Created *before* executing 'pvscan --cache --activate ay --major 9 --minor 127'
  Backup Time:  Tue Sep 23 13:25:49 2014                                                     

 
  File:        /etc/lvm/archive/data_00689-450113657.vg
  VG name:      data                                   
  Description:  Created *before* executing 'pvscan --cache --activate ay --major 9 --minor 127'
  Backup Time:  Tue Sep 23 13:25:49 2014                                                     

 
  File:        /etc/lvm/archive/data_00690-1818098044.vg
  VG name:      data                                   
  Description:  Created *before* executing 'lvextend -L+700G /dev/data/prod_corporativos'
  Backup Time:  Sat Sep 27 17:47:37 2014                                               

 
  File:        /etc/lvm/archive/data_00691-747592091.vg
  VG name:      data                                   
  Description:  Created *before* executing 'lvresize -L 500G /dev/mapper/data-prod_corporativos'
  Backup Time:  Sat Sep 27 19:07:53 2014                                                       

 
  File:        /etc/lvm/archive/data_00692-562380709.vg
  VG name:      data                                   
  Description:  Created *before* executing '/usr/sbin/lvcreate -l 100%FREE -s -n prod_corporativos-snap /dev/data/prod_corporativos'
  Backup Time:  Thu Oct  2 02:14:12 2014                                                                                           

 
  File:        /etc/lvm/archive/data_00693-804306699.vg
  VG name:      data                                   
  Description:  Created *before* executing '/usr/sbin/lvremove -f data/prod_corporativos-snap'
  Backup Time:  Thu Oct  2 02:33:23 2014                                                     

 
  File:        /etc/lvm/archive/data_00694-337436075.vg
  VG name:      data                                   
  Description:  Created *before* executing '/usr/sbin/lvcreate -l 100%FREE -s -n prod_corporativos-snap /dev/data/prod_corporativos'
  Backup Time:  Fri Oct  3 03:36:26 2014                                                                                           

 
  File:        /etc/lvm/archive/data_00695-348986221.vg
  VG name:      data                                   
  Description:  Created *before* executing '/usr/sbin/lvremove -f data/prod_corporativos-snap'
  Backup Time:  Fri Oct  3 03:37:05 2014                                                     

 
  File:        /etc/lvm/archive/data_00696-823141307.vg
  VG name:      data                                   
  Description:  Created *before* executing '/usr/sbin/lvcreate -l 100%FREE -s -n prod_corporativos-snap /dev/data/prod_corporativos'
  Backup Time:  Tue Oct  7 01:00:04 2014                                                                                           

 
  File:        /etc/lvm/archive/data_00697-1976214777.vg
  VG name:      data                                   
  Description:  Created *before* executing '/usr/sbin/lvremove -f data/prod_corporativos-snap'
  Backup Time:  Tue Oct  7 01:20:16 2014                                                     

 
  File:        /etc/lvm/archive/data_00698-1484824141.vg
  VG name:      data                                   
  Description:  Created *before* executing '/usr/sbin/lvcreate -l 100%FREE -s -n prod_corporativos-snap /dev/data/prod_corporativos'
  Backup Time:  Wed Oct  8 03:50:51 2014                                                                                           

 
  File:        /etc/lvm/archive/data_00699-821721622.vg
  VG name:      data                                   
  Description:  Created *before* executing '/usr/sbin/lvremove -f data/prod_corporativos-snap'
  Backup Time:  Wed Oct  8 03:51:26 2014                                                     

 
  File:        /etc/lvm/archive/data_00700-461400040.vg
  VG name:      data                                   
  Description:  Created *before* executing '/usr/sbin/lvcreate -l 100%FREE -s -n prod_corporativos-snap /dev/data/prod_corporativos'
  Backup Time:  Thu Oct  9 03:52:36 2014                                                                                           

 
  File:        /etc/lvm/archive/data_00701-641499306.vg
  VG name:      data                                   
  Description:  Created *before* executing '/usr/sbin/lvremove -f data/prod_corporativos-snap'
  Backup Time:  Thu Oct  9 03:53:16 2014                                                     

 
  File:        /etc/lvm/archive/data_00702-1887772695.vg
  VG name:      data                                   
  Description:  Created *before* executing '/usr/sbin/lvcreate -l 100%FREE -s -n prod_corporativos-snap /dev/data/prod_corporativos'
  Backup Time:  Fri Oct 10 01:11:10 2014                                                                                           

 
  File:        /etc/lvm/archive/data_00703-1210497069.vg
  VG name:      data                                   
  Description:  Created *before* executing '/usr/sbin/lvremove -f data/prod_corporativos-snap'
  Backup Time:  Fri Oct 10 01:12:10 2014                                                     

 
  File:        /etc/lvm/archive/data_00704-548989363.vg
  VG name:      data                                   
  Description:  Created *before* executing '/usr/sbin/lvcreate -l 100%FREE -s -n prod_corporativos-snap /dev/data/prod_corporativos'
  Backup Time:  Fri Oct 10 03:35:16 2014                                                                                           

 
  File:        /etc/lvm/archive/data_00705-1810326160.vg
  VG name:      data                                   
  Description:  Created *before* executing '/usr/sbin/lvremove -f data/prod_corporativos-snap'
  Backup Time:  Fri Oct 10 03:35:47 2014                                                     

 
  File:        /etc/lvm/archive/data_00706-1965352219.vg
  VG name:      data                                   
  Description:  Created *before* executing '/usr/sbin/lvcreate -l 100%FREE -s -n prod_corporativos-snap /dev/data/prod_corporativos'
  Backup Time:  Tue Oct 14 04:10:32 2014                                                                                           

 
  File:        /etc/lvm/archive/data_00707-490370999.vg
  VG name:      data                                   
  Description:  Created *before* executing '/usr/sbin/lvremove -f data/prod_corporativos-snap'
  Backup Time:  Tue Oct 14 04:29:49 2014                                                     

 
  File:        /etc/lvm/archive/data_00708-16739621.vg
  VG name:      data                                 
  Description:  Created *before* executing '/usr/sbin/lvcreate -l 100%FREE -s -n prod_corporativos-snap /dev/data/prod_corporativos'
  Backup Time:  Wed Oct 15 03:38:02 2014                                                                                           

 
  File:        /etc/lvm/archive/data_00709-663978874.vg
  VG name:      data                                   
  Description:  Created *before* executing '/usr/sbin/lvremove -f data/prod_corporativos-snap'
  Backup Time:  Wed Oct 15 03:57:58 2014                                                     

 
  File:        /etc/lvm/archive/data_00710-392126074.vg
  VG name:      data                                   
  Description:  Created *before* executing '/usr/sbin/lvcreate -l 100%FREE -s -n prod_corporativos-snap /dev/data/prod_corporativos'
  Backup Time:  Thu Oct 16 02:54:26 2014                                                                                           

 
  File:        /etc/lvm/archive/data_00711-646205960.vg
  VG name:      data                                   
  Description:  Created *before* executing '/usr/sbin/lvremove -f data/prod_corporativos-snap'
  Backup Time:  Thu Oct 16 02:55:17 2014                                                     

 
  File:        /etc/lvm/archive/data_00712-699477189.vg
  VG name:      data                                   
  Description:  Created *before* executing '/usr/sbin/lvcreate -l 100%FREE -s -n prod_corporativos-snap /dev/data/prod_corporativos'
  Backup Time:  Fri Oct 17 02:53:26 2014                                                                                           

 
  File:        /etc/lvm/archive/data_00713-2000095834.vg
  VG name:      data                                   
  Description:  Created *before* executing '/usr/sbin/lvremove -f data/prod_corporativos-snap'
  Backup Time:  Fri Oct 17 02:53:56 2014                                                     

 
  File:        /etc/lvm/archive/data_00714-470145143.vg
  VG name:      data                                   
  Description:  Created *before* executing '/usr/sbin/lvcreate -l 100%FREE -s -n prod_corporativos-snap /dev/data/prod_corporativos'
  Backup Time:  Wed Oct 22 01:55:29 2014                                                                                           

 
  File:        /etc/lvm/archive/data_00715-1890993045.vg
  VG name:      data                                   
  Description:  Created *before* executing '/usr/sbin/lvremove -f data/prod_corporativos-snap'
  Backup Time:  Wed Oct 22 01:56:02 2014                                                     

 
  File:        /etc/lvm/archive/data_00716-1843448680.vg
  VG name:      data                                   
  Description:  Created *before* executing 'pvscan --cache --activate ay --major 9 --minor 127'
  Backup Time:  Wed Oct 22 20:44:48 2014                                                     

 
  File:        /etc/lvm/archive/data_00717-197234564.vg
  VG name:      data                                   
  Description:  Created *before* executing 'pvscan --cache --activate ay --major 9 --minor 127'
  Backup Time:  Wed Oct 22 20:44:48 2014


  File:        /etc/lvm/archive/data_00718-1096694472.vg
  VG name:      data
  Description:  Created *before* executing 'pvscan --cache --activate ay --major 9 --minor 127'
  Backup Time:  Wed Oct 22 20:46:35 2014


  File:        /etc/lvm/archive/data_00719-1135019502.vg
  VG name:      data
  Description:  Created *before* executing 'pvscan --cache --activate ay --major 9 --minor 127'
  Backup Time:  Wed Oct 22 20:46:35 2014


  File:        /etc/lvm/archive/data_00720-541870226.vg
  VG name:      data
  Description:  Created *before* executing 'pvscan --cache --activate ay --major 9 --minor 127'
  Backup Time:  Wed Oct 22 20:56:17 2014


  File:        /etc/lvm/archive/data_00721-131402915.vg
  VG name:      data
  Description:  Created *before* executing 'pvscan --cache --activate ay --major 9 --minor 127'
  Backup Time:  Wed Oct 22 20:56:17 2014


  File:        /etc/lvm/backup/data
  VG name:      data
  Description:  Created *after* executing 'pvscan --cache --activate ay --major 9 --minor 127'
  Backup Time:  Wed Oct 22 20:56:17 2014

Any hints?

rknichols 10-22-2014 08:04 PM

Quote:

Originally Posted by littleball (Post 5257983)
cp /etc/lvm/backup/data /etc/lvm/backup/data1
Edit data1 file:

pe_start = 10240
Code:

[root@storage-batch backup]# vgcfgrestore -v -f data1 data
  Restored volume group data
[root@storage-batch backup]# blkid /dev/mapper/data-*
/dev/mapper/data-lab_templates: PTTYPE="dos"
/dev/mapper/data-lab_vmimages: PTTYPE="dos"
/dev/mapper/data-prod_corporativos: UUID="984ad8ae-449c-4ca3-b5c3-522413edde24" TYPE="ext4"
/dev/mapper/data-prod_vmimages--batch: UUID="c27b4a05-5dd5-4d04-92ce-10483c354238" TYPE="ext4"

So I change back everything, using the original lvm data file:

Code:

[root@storage-batch backup]# vgcfgrestore -v -f data data
  Restored volume group data

I,m missing something here in the command line, if I do:

vgcfgrestore -v -f data1

I get a message from stdout:

[root@storage-batch backup]# vgcfgrestore -v -f data1
Please specify a *single* volume group to restore.

Looks like I messed up twice. The big one is that the units for pe_start are 512-byte sectors, not bytes, so to change the offset by 8192 bytes you need to add 16 to the old value:
Code:

                        pe_start = 2064
The other issue is that, as you discovered, vgcfgrestore needs to be told the name of the VG even though you are giving it a file with just one VG in it.

Sorry about that.

littleball 10-23-2014 02:59 PM

Hello guys,

Yesterday nigth "raid-check" daemon start to run, and today morning one LV inside of data VG was also corrupted :( I tried to mount it since it was working nicely yesterday, and I couldn,t...I get a message of unknown fs, I tried to run testdisk on this new corrupted LV with no avail. It doesn,t find any type of filesystem inside the LV.

I speak with my boss, and he decide that is better to delete all and create the raid again and the VG again. So, I want to thank you all for your help, but definetely this raid+lvm is extremely corrupted, so is better to backup what still is working and create all again, is for the better.

:) you were all very helpfull, but sometimes is better to fix deleting all than to try to run over something that is damaged. :)


All times are GMT -5. The time now is 11:46 PM.