LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - General (https://www.linuxquestions.org/questions/linux-general-1/)
-   -   Recover data from full image disk (https://www.linuxquestions.org/questions/linux-general-1/recover-data-from-full-image-disk-4175522277/)

littleball 10-15-2014 11:06 PM

Recover data from full image disk
 
Hello.

Long story short, I,m running a server (Fedora 20) with 4 HD disk, with raid 10 and LVM. For some reason, the raid 10 got corrupted and I lost the LVM table, I was able to recover back the raid 10 and the LVM table, but when I try to mount one of the logical volume, I get the typical :

mount: wrong fs type, bad option, bad superblock on ..etc...etc.

Each disk of the raid+LVM is XFS full formated without partitions. (which means raid disk where formated as: mkfs.xfs /dev/sda -- No partitions --nothing of /dev/sda1, /dev/sda2, no, no, no boot flag, no nothing).

I tried to fix each LV doing xfs_check and xfs_repair, it was useless...it didn,t work. I have data inside those LV I don,t want to lose, I have tried to dd=/dev/data/logical-name of=something.img , but I am not able to mount the final disk image either...using loop when I tried to mount the whole disk image, I get an error message:


NTFS signature is missing.
Failed to mount '/dev/loop0': Invalid argument
The device '/dev/loop0' doesn't seem to have a valid NTFS.
Maybe the wrong device is used? Or the whole disk instead of a
partition (e.g. /dev/sda, not /dev/sda1)? Or the other way around?

Is anything I can do to recover the data inside the LV?. Raid is ok according to mdstat, and I,m able to view VG and LV with vgscan and pvscan, and they,re active.

Please help :)

Nogitsune 10-16-2014 02:45 PM

Are you mounting the filesystem - especially the loop device - with '-t xfs' option for XFS file system? Although if the disk itself refuses to get mounted then it seems unlikely that the loop device would either. Also, are you sure the LVM metadata you got was the latest version? Did you get it from the disks, or from the backup under /etc?

jefro 10-16-2014 03:32 PM

Can't hurt to try testdisk on a distro that supports lvm and xfs in the level that you have.

Odd such a disaster.

littleball 10-17-2014 10:48 AM

I did tried several times to mount the img file using "-t xfs" , unfortunately I wasn,t able, I got the message of "Unknown filesystem type, etc, etc".

If I use testdisk, (I am not very clever with this tool :) ) and I select my corrupted LVM, I,m able to see my old data there (2 virtual machines) and I,m able to list files inside those 2 virtual machines, but how do I save an image of these 2 virtual machines inside my corrupted LVM, I only see an option to copy the files inside my 2 virtual machines, but I want to copy or save the whole virtual machine as an .img or something. Does testdisk allow this? or am I able only to save file per file of whatever testdisk find inside my corrupted LVM.

Nogitsune 10-17-2014 03:08 PM

I'm not really sure that I understand the situation. You're saying that full disks were formatted with mkfs.xfs, but that you have lvm on top of raid (and raid obviously on top of disks). From this I'm assuming what you mean is that the full disks are used as raid disks (instead of using raid partitions).. and the raid device would then likely be used as physical volume for LVM.

You're also talking about virtual machines inside the LVM (and seeing files inside them). I can only assume you mean logical volumes inside the physical volumes.

If this is the situation, it's pretty similar to what I had - except I used partitions for raid, and my raid was level 6 instead of 10. If you've gotten far enough that you managed to recover the logical volumes, yet you can't actually mount them, then it actually doesn't sound very good to me. If you can see the files with testdisk, and can copy them per-file, then I'd try to recover a couple files like that, just to see if they come out fine, or if they are corrupt. Preferrably use files that are at least a few megabytes in size so they'll take up several blocks on disk - to see that they are consistent across multiple blocks.

If the files come out fine, then it seems that the actual data on the disks is still fine - which of course is good. I'd assume then that there's either something wrong with the partition's superblock (which makes it unrecognizable for mount), or the LVM was recovered with outdated metadata (which might cause it to set the logical volumes to wrong addresses). I'm sure there are other possible explanations, but those two come to mind first.

If the files themself turn out corrupt, or are unrecoverable, then something in the partition's structure itself would likely be wrong. Raid 10 is mirrored disks striped together, so possibly striping the mirrors in wrong order might cause something like that. In that case switching the stripe to right order might correct things (then again if it's initially right, and you swap it to wrong order, bad things might happen).

I'm not sure what else to suggest. You could try taking dd dump out of the top level raid device - for example if the raid device is md1, doing:

Code:

# dd if=/dev/md1 of=/some/path/MD1_DUMP.img bs=1M count=100
That should write 100M from the start of raid device to data file. You could then look at the file with either 'hexdump' of just plain 'less'. Looking at it with less should show some garbage, at first, but pretty soon come to LVM metadata - which should show a sequential list of metadata versions, similar to what I wrote on the thread I did about fixing my raid/lvm partitions. You could use that to find the highest sequence number, and make sure the header you used to fix the LVM was the latest one.

metaschima 10-17-2014 03:13 PM

It is likely that you will have to carve data out of the disk using testdisk/photorec or foremost.

Remember to use ddrescue to image a bad drive as dd isn't designed to handle errors well.

jefro 10-17-2014 03:49 PM

One might use dd or ddrescue to save off all data to external for recovery. It won't fix it as such. So, it could be a possible way to work on it on remote system.

I think I recall that testdisk offers a basic path to where you want to save data off. It may be possible to save data within corrupted filesystem but I'd only do that for files I didn't care about.

Double check the testdisk docs for usage.

Nogitsune 10-18-2014 04:47 AM

Just one thing... I assumed at first what you wrote on the original post was some kind of mistake or oversight:

Quote:

Each disk of the raid+LVM is XFS full formated without partitions. (which means raid disk where formated as: mkfs.xfs /dev/sda -- No partitions --nothing of /dev/sda1, /dev/sda2, no, no, no boot flag, no nothing).
BUT! If you really did exactly what you wrote... you first created the raid 10, then the lvm system on top of it.. but at the last step you actually formatted the full drives. NOT the raid partitions, NOT the logical volumes - but in fact the whole underlying hard drive. What you end up with is indeed a fully corrupt raid and lvm system - because you just destroyed their metadata with your XFS partition. At this point you could use the XFS partition just fine by mounting the whole drive. But if you then restore the raid and lvm metadata to the disk AFTER you have used it as a regular XFS partition.. what results is, you now have a functioning raid + lvm system, but this time you corrupted your XFS partition's metadata. And that, I'm afraid, is most likely irrecoverable.

If that is the case, then in all likelihood the best you can do is to try to recover bits and pieces of the data with recovery tools (such as the testdisk mentioned earlier in thread). However I'm not entirely sure of the situation.. can you repeat EXACTLY what you have done, every step of the way, with every detail? Depending on where and how your data actually was, it may or may not be possible to recover it. So what were the exact commands you used to partition the raids and lvm, to format the disks, to mount them, to try and recover them with testdisk?

syg00 10-18-2014 05:39 AM

What he said.
If you followed a page on the web, post a link so we can see what you did - or were directed to do. As stated, we need to know exact details. And was this a working environment that "went bad" ?.
What changed ?.

littleball 10-20-2014 07:37 AM

Excuse me for my bad english :)

I,ll try to explain the best I can. I didn,t create the raid, lvm, format, etc. in this server, it,s a server here in the company I work at. It was working fine, but suddenly one morning the raid was corrupted (I don,t know the reasons), but I,m 100% sure it wasn,t someone that login and did something nasty, I think it was some hardware problem or sort of. Well, as far as I know, the thing initially was made :

1 - Raid 10 with 4 disk
2 - LVM on top of that
3 - Volume Group
4 - 4 Logical Volume
5 - Format each Logical Volume as XFS (each logical volume is a disk, there are 4 disk in the raid. I have 4 logical volume)

Sorry if I express myself saying they format the whole raid disk with XFS, it wasn,t that, they did format each logical volume as XFS, each logical volume is 1 disk of the raid, they didn,t make partitions on each disk, they just create a LV on each disk and format as XFS each disk (again, sorry for my bad english). The case is, I,m able to dump with dd and image of the corrupted logical volume, but I am not able to mount it even with loop, I get the bad superblock thing error message. Linux, does recognize the raid disk, does recognize the LVM, but doesn,t recognize that those logical volume are XFS partitions, (like when you run cfdisk /dev/disk command and you see your partitions with their filesystem type added) well, in my case if I run cfdisk on the logical volume, I see the partition with filesystem type "Linux" instead of "XFS" (I just run cfdisk to see, I haven,t format, write, or anything). The type of filesystem is lost, I would like to recover the type of filesystem without losing data, I don,t know if this is possible, but in case someone knows :) I will be happy.

rknichols 10-20-2014 10:38 AM

Quote:

Originally Posted by littleball (Post 5256457)
in my case if I run cfdisk on the logical volume, ...

Why would you do that? The LV is an XFS filesystem, not a partitioned volume. It does not (or should not) contain a partition table. If you are seeing anything that looks like a partition table there, it suggests that someone did try to run a partitioning tool on that volume and overwrote the XFS filesystem header in the process.

What do you get when you run
Code:

hexdump -C /dev/mapper/<volgroup_name>-<lvol-name> | head -24
for one of the affected LVs?

Nogitsune 10-20-2014 10:49 AM

Ok, the next question is, what exactly did you do to recover back the RAID and LVM? For example, are you certain that the RAID was recovered with right disks in each mirror, and the stripes restored in right order, with right chunk size, correct RAID metadata version etc. And as for LVM, are you certain that the LVM metadata was restored from the latest version?

You said you tried using xfs_repair, and I believe it should look for backup superblocks. If it can't find them, or if those too are corrupted, then I think there's likely something fundamentally wrong with the filesystem. There's 'xfs_db' that might be of help in trying to further analyze what's wrong with the system - but it goes beyond my knowledge.

Posting the output you get from 'xfs_check' and 'xfs_repair -n' on each of the logical volumes might also help for people to get a better understanding of what has gone wrong.

littleball 10-20-2014 12:08 PM

Quote:

Originally Posted by rknichols (Post 5256536)
Why would you do that? The LV is an XFS filesystem, not a partitioned volume. It does not (or should not) contain a partition table. If you are seeing anything that looks like a partition table there, it suggests that someone did try to run a partitioning tool on that volume and overwrote the XFS filesystem header in the process.

What do you get when you run
Code:

hexdump -C /dev/mapper/<volgroup_name>-<lvol-name> | head -24
for one of the affected LVs?

This is what I get:

[root@storage-batch ~]# hexdump -C /dev/mapper/data-lab_templates | head -24
00000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
000001b0 00 00 00 00 00 00 00 00 f7 fa 55 3d 00 00 00 20 |..........U=... |
000001c0 21 00 83 2a ac fe 00 08 00 00 00 f8 7f 0c 00 00 |!..*............|
000001d0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
000001f0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 55 aa |..............U.|
00000200 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00002000 58 46 53 42 00 00 10 00 00 00 00 00 01 90 00 00 |XFSB............|
00002010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00002020 13 d8 d7 ac 53 7b 4f 48 86 56 56 ba 11 15 ce 35 |....S{OH.VV....5|
00002030 00 00 00 00 01 00 00 04 00 00 00 00 00 00 00 80 |................|
00002040 00 00 00 00 00 00 00 81 00 00 00 00 00 00 00 82 |................|
00002050 00 00 00 01 00 64 00 00 00 00 00 04 00 00 00 00 |.....d..........|
00002060 00 00 32 00 b4 b4 02 00 01 00 00 10 00 00 00 00 |..2.............|

Sorry about the cfdisk cli, I though using cfdisk I was going to be able to see the filesystem type of the LVM (even if I,m sure is XFS, I just wanted to see if the system do recognize it as XFS). Although I didn,t write, edit, change, format, etc. I didn,t do anything under cfdisk, just watch :)

Nogitsune

Unfortunately, I can,t answer to you about what was done to recover the raid and LVM, I didn,t do it :( my boss did something to recover that, honestly I don,t know how he recover the Raid and the LVM, but whatever he did, it did work since raid and LVM were recovered.

This is what I get with xfs_check:

[root@storage-batch ~]# xfs_check /dev/mapper/data-lab_templates
xfs_check: /dev/mapper/data-lab_templates is not a valid XFS filesystem (unexpected SB magic number 0x00000000)
xfs_check: WARNING - filesystem uses v1 dirs,limited functionality provided.
xfs_check: read failed: Argumento inválido
cache_node_purge: refcount was 1, not zero (node=0x15c3950)
xfs_check: cannot read root inode (22)
bad superblock magic number 0, giving up

And with xfs_repair:

[root@storage-batch ~]# xfs_repair /dev/mapper/data-lab_templates
Phase 1 - find and verify superblock...
bad primary superblock - bad magic number !!!

attempting to find secondary superblock...
.................................................................................................... ................................................................found candidate secondary superblock... unable to verify superblock, continuing...
[etc.]
...Sorry, could not find valid secondary superblock
Exiting now.

When I run testdisk, I am able to see all the data that is inside this corrupted LVM, but I am not able to dump that out. I,m affraid if I choose to "write" what testdisk found, maybe I can lose everything ..... that,s why I haven,t done it.

Any help with this XFS LVM would be highly appreciate :)

rknichols 10-20-2014 12:55 PM

Quote:

Originally Posted by littleball (Post 5256604)
[root@storage-batch ~]# hexdump -C /dev/mapper/data-lab_templates | head -24
00000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
000001b0 00 00 00 00 00 00 00 00 f7 fa 55 3d 00 00 00 20 |..........U=... |
000001c0 21 00 83 2a ac fe 00 08 00 00 00 f8 7f 0c 00 00 |!..*............|
000001d0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
000001f0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 55 aa |..............U.|
00000200 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00002000 58 46 53 42 00 00 10 00 00 00 00 00 01 90 00 00 |XFSB............|
00002010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00002020 13 d8 d7 ac 53 7b 4f 48 86 56 56 ba 11 15 ce 35 |....S{OH.VV....5|
00002030 00 00 00 00 01 00 00 04 00 00 00 00 00 00 00 80 |................|
00002040 00 00 00 00 00 00 00 81 00 00 00 00 00 00 00 82 |................|
00002050 00 00 00 01 00 64 00 00 00 00 00 04 00 00 00 00 |.....d..........|
00002060 00 00 32 00 b4 b4 02 00 01 00 00 10 00 00 00 00 |..2.............|

Sorry about the cfdisk cli, I though using cfdisk I was going to be able to see the filesystem type of the LVM (even if I,m sure is XFS, I just wanted to see if the system do recognize it as XFS). Although I didn,t write, edit, change, format, etc. I didn,t do anything under cfdisk, just watch :)

That is really weird. I do see a partition table there with a single partition starting 1 Megabyte (2048 sectors) into the volume:
Code:

    Device Boot      Start        End      Blocks  Id  System
/dev/loop0p1          2048  209715199  104856576  83  Linux

But, below that I see an XFS filesystem header at byte offset 8192 (0x2000). I can't imagine how it got that way. Try this:
Code:

losetup -r -o 8192 -f --show /dev/mapper/data-lab_templates
Then see if xfs_check can make sense of that loop device. (I've deliberately made that a read-only mapping to block possible disaster if xfs_check tries to "fix" anything.)

littleball 10-20-2014 01:35 PM

Hello rknichols.

You were right. Indeed there is a partition inside the logical volume:

[root@storage-batch prueba]# fdisk /dev/mapper/data-lab_templates
Welcome to fdisk (util-linux 2.23.2).

Changes will remain in memory only, until you decide to write them.
Be careful before using the write command.


Orden (m para obtener ayuda): p

Disk /dev/mapper/data-lab_templates: 107.4 GB, 107374182400 bytes, 209715200 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 524288 bytes / 1048576 bytes
Disk label type: dos
Identificador del disco: 0x3d55faf7

Disposit. Inicio Comienzo Fin Bloques Id Sistema
/dev/mapper/data-lab_templates1 2048 209715199 104856576 83 Linux

Orden (m para obtener ayuda): q


Since, I was unaware of this I ask my boss about it, he said he made it since linux "mount" command, require partition table definition to work???. However, he said he didn,t format the LV he just write the partition table, but insist the old data it still there....

I did tried what you suggest:

[root@storage-batch prueba]# losetup -r -o 8192 -f --show /dev/mapper/data-lab_templates
/dev/loop0
[root@storage-batch dev]# xfs_check /dev/loop0
xfs_check: error - read only 0 of 512 bytes
ERROR: The filesystem has valuable metadata changes in a log which needs to
be replayed. Mount the filesystem to replay the log, and unmount it before
re-running xfs_check. If you are unable to mount the filesystem, then use
the xfs_repair -L option to destroy the log and attempt a repair.
Note that destroying the log may cause corruption -- please attempt a mount
of the filesystem before doing this.

So, I losetup -d /dev/loop0 and try it again (this time I try to mount it):
[root@storage-batch prueba]# losetup -r -o 8192 -f --show /dev/mapper/data-lab_templates
/dev/loop0
[root@storage-batch dev]# mount /dev/loop0 -o loop /prueba
mount: /dev/loop0 is write-protected, mounting read-only
mount: cannot mount /dev/loop0 read-only


And in dmesg:

[850196.127770] loop0: rw=32, want=209715200, limit=209715184
[850196.127846] XFS (loop0): Mounting Filesystem
[850196.135166] XFS (loop0): recovery required on read-only device.
[850196.135229] XFS (loop0): write access unavailable, cannot proceed.
[850196.135284] XFS (loop0): log mount/recovery failed: error 30
[850196.135492] XFS (loop0): log mount failed
[850211.171322] attempt to access beyond end of device



I do have 3 more LV in the same situation (they are inside the same VG), in the other 3 LV I didn,t see any partition inside. But I am not able to mount them with xfs either (I get the same results with xfs_repair or xfs_check, bad superblock thing). This is one of the other corrupted LV under XFS:

[root@storage-batch /]# hexdump -C /dev/mapper/data-lab_vmimages | head -24
00000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
000001b0 00 00 00 00 00 00 00 00 93 25 59 88 00 00 00 00 |.........%Y.....|
000001c0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
000001f0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 55 aa |..............U.|
00000200 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00002000 58 46 53 42 00 00 10 00 00 00 00 00 10 00 00 00 |XFSB............|
00002010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00002020 0d a7 59 f1 f1 d6 4a af 8c ab 2e 5e 1f 79 95 1f |..Y...J....^.y..|
00002030 00 00 00 00 04 00 00 04 00 00 00 00 00 00 00 80 |................|
00002040 00 00 00 00 00 00 00 81 00 00 00 00 00 00 00 82 |................|
00002050 00 00 00 01 01 f4 00 00 00 00 00 09 00 00 00 00 |................|
00002060 00 00 fa 00 b4 b4 02 00 01 00 00 10 00 00 00 00 |................|
00002070 00 00 00 00 00 00 00 00 0c 09 08 04 19 00 00 19 |................|
00002080 00 00 00 00 00 00 15 40 00 00 00 00 00 00 09 fb |.......@........|
00002090 00 00 00 00 01 de 79 94 00 00 00 00 00 00 00 00 |......y.........|


All times are GMT -5. The time now is 02:29 PM.