Quote:
Quote:
Quote:
Here is a suggestion. Go ahead and adjust the LVM pe_start value, but don't try to mount or check anything. Now use dd to copy an image of just one of the LVs to another device. Start with one of the smaller ones, like prod_portables or lab_templates (just 100GB each). Then try to mount and check that new image. That will leave the original data safe, but tell you what would have happened had you tried that on the original. That's a lot quicker than imaging the whole 3.6TB VG. |
Quote:
Quote:
|
Hello guys,
From now on, my boss is into this too :) (I sent him the url link of this post, he read it and agree with all the comments). We haven,t made anything yet, since we,re talking about a production server, on working days, it needs to be up and running..except weekends. For the moment, we,re thinking to recreate some of your suggestions but on some dd image like rknichols suggest...is the safest way, is a task that,s gonna take a few days since we cannot overload too much the server on working hours (and we know how dd eats CPU). Probably a few of you already want to know how all this is going to end (me 2), I,ll keep you update with all the steps we,re going to go through on the following days. XFS wiki page, indicates that is possible to mount an XFS partition on read-only and use XFS tools to try to repair that, but for some reason, I am unable to mount it read only with XFS, so maybe the wiki was talking about an very old version of XFS, or this XFS is compile with some attributes or parameters that indicates that read-only is not allowed, this sucks since other filesystems do allow the system to be mounted read-only, I imagine if root partition were XFS and something nasty happens you can,t linux single your system?. It doesn,t matter now the real cause, guess we need to work on what we have. Nogitsune Unfortunately, I can,t tell if the current chunk size of the disk on raid, are the same as the original. But I can tell you, that like 2 weeks before this raid got "mysteriously" damage, one of the 4 disks failed, so my co-worker remove the failed disk and let the raid working with 3 disks and a few days later, my co-worker insert a new disk into this raid. The raid was working fine with the 4 disks, and suddenly one morning there were no raid, no LVM, nothing..We still don,t know what happened or what cause this. Quote:
|
Quote:
|
I can't really say how your raid implementation handles the spare disk. If it were the kind of software raid I use myself, I'd have to apply the mdadm command to add the spare to the array, and then the reconstruction would do it's magic and all would be fine (I use raid on disk partitions so I'd first need to partition the new disk and then add the partition to raid instead of whole disk). However in your case it's possible the whole thing worked automatically, or maybe the co-worker in question did the needed commands to add it. I suppose it's also possible that the disk was never actually added to the array.. and then 10 days later another disk from the same mirror fell off from the raid, and caused the whole thing to fail. I couldn't really say.
Either way it seems right now the raid is up and running, and hopefully there is no severe corruption on data. What I'm mostly worried is the possibility of the following scenario: 1) for one reason or the other, the raid failed, and wouldn't reassemble properly. to restore it, decision was to do something akin to 'mdadm --create' - basically to recreate the whole raid system from original disks, writing the new header etc. If this is done exactly right, then the raid will be restored, and you'll be able to access your original data with no further losses. This is basically what I did for my own failed raid6, and at the end of day I got everything important restored. But.. 2) something went wrong. What I feel is most likely is that, as rknichols pointed out, the raid is for some reason missaligned by 8k. And I'm worried about what this does to the original data. Striped raid on two disks works by writing the 'chunk-size' amount of data (512k in this case) alternating between disk 0 and disk 1. So for each 1M of data, first half is written to disk 0 and second half to disk 1. Now, ignoring the raid header, assume that the original raid was created aligned on each disk on position that starts from 8k. Chunks would be written at locations (0+8k) = 8k, (512+8k) = 520k, (1024+8k) = 1032k and so on. To read a long strip of data, you'd assemble it from disk0:8k->520k + disk1: 8k->520k + disk0:520k->1032k + disk1:520k->1032k and so forth. Now if the raid was recreated, but without that 8k chunk at the start of the disk (as it seems), then the original data is still placed on those same chunks (8-520,520-1032 and so on), but the new raid system will think that the data is instead on chunks 0-512, 512-1024 and so forth. 3) lvm was now recreated on top of this missaligned raid. This is where we would be now. For the most part the data would seem to be 8k missaligned, and shifting the logical volumes by 8k would cause this majority of data to correct itself. However, because of the error in the raid beneath the lvm error, the 8k chunk within each 512k of data would still be shuffled. You might be able to mount the system now since the XFS header would be in correct place (originally I believe this 8k missalignment is the reason you could not mount the system, not even in read-only mode). If you now mounted this 8k shifted system, and ran file check on it, then I believe one of two things would happen. either the repair would determine that it can't make sense of the data, and it would plain out fail. Or it would attempt to repair the structure, and probably corrupt the whole partition. How could this have happened? This I don't know for certain. What rknichols suggested was that something changed in raid settings that caused the header to be 8k shorter than it used to be. I don't know enough about raid headers to say anything about that. I don't know if their size changes, and what would cause it. Maybe different metadata version? Change between 1.0 and 1.2 or something? I don't know. What I was suspecting myself, was that the originally the disks were partitioned, with single partition that starts from 8k point on the disk.. and then the raid was created on these partitions (e.g. /dev/sdc1, /dev/sdd1 and so forth). Currently the raid is made from the whole disks (/dev/sdc, /dev/sdd and so forth), where as the other raid array seems to be made from partitions (/dev/sda1, /dev/sdb1) so this kind of error would seem feasible - IF the raid was recreated from scratch, which like rknichols pointed out, would seem likely based on the timestamps (assuming the year of the server was originally set to 2013 instead of 2014 by mistake). At this point I can't say anything for certain - but this is exactly why I'm suggesting to hold from doing ANY kind of writes on those disks until you know exactly what's going on there. Using dd to only read from the disks, and then doing whatever tests with those images, seems like the safest way at this point. -- edit -- one more thing - assuming a disk failed and was replaced properly.. and then few days later the whole raid failed.. there's a possibility that you're dealing with something other than regular disk failures. Faulty controller, bad memory chips, failing power source, irregular/spiking electricity, or some other issue like that. Something that keeps popping the disks out. Again, difficult to pinpoint and verify, but something to keep an eye out for at least. |
I have use xfs_check and xfs_repair with the -n a few days ago, not with the LV mounted of course, but the results are negative :( can,t find a valid superblock anywhere, which is consistent from what you said, if the filesystem inside the LV is misaligned, any temp to scan would be useless since is searching in some place that doesn,t belong probably to XFS. I did try before to mount a dd image of one of the corrupted LV as read-only to get my data back at least....I wasn,t able, I don,t remember the exact error, but I wasn,t able.
I though that inconsistent state of a disk, was metadata that was in buffer but wasn,t yet fully write to disk or were in proccess of writing when suddenly power failure or crash happens, if this is true (and correct me if I,m wrong) filesystem still can recover from journal. But in this particular case (of the corrupted LV) since everything is misaligned, and everything is scattered everywhere, is there a chance XFS tools can recover itself having the metadata this way?. I have an Slackware server with 1 free partition i,m gonna intentionally create a XFS partition, and change the offset of start and end using sfdisk (trying to make a similar escenario, like the corrupted LV). after, going to try to mount it read-only and see if xfs allows me. :) will share results after...still haven,t done anything on the production server, but can make some test on other servers before a final task is made on the production one. |
Quote:
What rknichols suggested with shifting the pe alignment and then dd'ing a small 100G partition into file.. and then mounting that file with loop device, sounds like a good test to try first. If the raid stripes are not missaligned, then all the data might magically just appear, and you could copy it out of that loopback without problems. if the raid stripes are wrong, then you might still be able to mount the loop (because it will find the XFS header), but the data will be either partially or entirely corrupted, and xfs repair will probably be like taking a blender to unhatched chick - you can't put it back together afterwards. |
Quote:
|
Quote:
It's not as good a test as copying a 100GB LV for testing, but as you said, that's hitting the server pretty hard. (It's not really the I/O that's the problem, but AFAIK there is no way to limit the amount of buffer cache that gets used for that operation, and that forces a lot of useful stuff for other processes out of the cache.) |
Quote:
Code:
dd bs=512k iflag=direct oflag=direct if=/dev/path/to/LV of=/path/to/LV.img If it's server you don't want to interrupt for installing more disks and such, and if you have a fast enough internal network (1G at least), then you could draw the image to another linux server through network using nc (netcat), sometime when the network is free - weekend maybe. |
Quote:
Will keep updated, in a few more hours. |
Quote:
|
Quote:
cp /etc/lvm/backup/data /etc/lvm/backup/data1 Edit data1 file: pe_start = 10240 Code:
[root@storage-batch backup]# vgcfgrestore -v -f data1 data Code:
[root@storage-batch backup]# vgcfgrestore -v -f data data vgcfgrestore -v -f data1 I get a message from stdout: [root@storage-batch backup]# vgcfgrestore -v -f data1 Please specify a *single* volume group to restore. Code:
[root@storage-batch backup]# vgcfgrestore -l data |
Quote:
Code:
pe_start = 2064 Sorry about that. |
Hello guys,
Yesterday nigth "raid-check" daemon start to run, and today morning one LV inside of data VG was also corrupted :( I tried to mount it since it was working nicely yesterday, and I couldn,t...I get a message of unknown fs, I tried to run testdisk on this new corrupted LV with no avail. It doesn,t find any type of filesystem inside the LV. I speak with my boss, and he decide that is better to delete all and create the raid again and the VG again. So, I want to thank you all for your help, but definetely this raid+lvm is extremely corrupted, so is better to backup what still is working and create all again, is for the better. :) you were all very helpfull, but sometimes is better to fix deleting all than to try to run over something that is damaged. :) |
All times are GMT -5. The time now is 11:46 PM. |