Seeking help & support with recovery of a vol grp held on a degraded RAID 1 array
Good evening all.
I am looking for some support in the recovery of a corrupted or damaged volume group. This is a personal server I built a while ago as a learning exercise and useful backup server, it is, or was running CentOS 6. I have, I believe all the data off it, with the possible exception of a SQL server database that's not hugely important, but since I damaged this machine it has become a bit of a quest to get it back up and running. Learning exercise all the way.
So. the machine was laid out like this, please refrain from criticising the build, I am not a certified professional! I think I am already adequately aware that it was not a great design.
--/15GB EXT4 partition marked as BOOT
--/217 GB linux raid partition
--/15GB SWAP partition
--/217 GB Linux raid partition
/SATA HDD 3,4,5,6,7
--/2TB "whole disk" linux raid partition
There are two multi device arrays in the machine, MD0 and MD1.
MD0 consisted of the five 2TB disks in RAID5 configuration and was mounted at /home
MD1 consisted of the two 217GB partitions in RAID1 configuration and on this device I had the volume group vg_boot
The motherboard of this machine was originally an Asus M3A32-MVP. This board has trouble booting the CentOS install medium. (It requires you, iirc, to downgrade the BIOS and then re-upgrade it after the installation.)
The short version is that last weekend I chucked the case and the motherboard in the bin and rebuild the machine into a spare chassis that had a slightly better motherboard and allowed me to boot the centos install media.
I have NOT reinstalled the MD0 disks. I am working only on attempting to get the system back up and running on the two disks from the MD1 device and the vg_boot volume group.
So. The commands I know, and have identified from my efforts thus far yield this:
#mdadm --detail /dev/md1
Version : 1.1
Creation Time : Thu Dec 29 19:09:10 2011
Raid Level : raid1
Array Size : 227812220 ( 217.26 GiB 233.28 GB)
Used Dev Size : 227812220 ( 217 GiB 233 GB )
Raid Devices : 2
Total Devices : 1
Persistence : Superblock is Persistent
Intent Bitmap : internal
Update Time Fri Mar 22 :22:35:26 2013
State : Active, Degraded
Active Devices : 1
Working Devices : 1
Failed Devices : 0
Spare Devices : 0
Name <machine name>
UUID : df067917:790fb242:436a8a2e:d6ac85a5
events : 5760
Number Major Minor RaidDevice State
0 8 66 0 active sync /dev/sde2
1 0 0 1 removed
couldnt find device with uuid kRwuJT-VGMT-ufth-Jfcz-3m7K-WaC5-c1dEf9.
-- Physical Volume --
PV Name unknown device
VG NAme vg_boot
PV Size 7.28TiB / Not usable 57.00Mib
Allocatable yes (but full)
PE Size 128MiB
Total PE 59616
Free PE 0
Allocated PE 59616
PV UUID kRwuJT-VGMT-ufth-Jfcz-3m7K-WaC5-c1dEf9
-- Physical Volume --
PV Name /dev/md1
VG Name vg_boot
PV Size 217.26Gib / Not Usable 8.87 MiB
Allocatable Yes (but full )
PE Size 128 MB
Total PE 1738
Free PE 0
Allocated PE 1378
PV UUID Yv9gUc-KyWf-lkqU-bd2e-ExO3-asc2-fdcG0w
#vgscan -v -P
PARTIAL MODE. Incomplete Logical Volumes will be prkRwuJT-VGMT-ufth-Jfcz-3m7K-WaC5-c1dEf9ocessed
Wiping CAche of LVM-Capable Devices
Wiping Internal VG Cache
Reading all physical volumes. This may take a while
finding all volume groups
finding volume group vg_boot
Couldn't find device with uuid kRwuJT-VGMT-ufth-Jfcz-3m7K-WaC5-c1dEf9
There are 1 physical volumes missing
Found volume group "vg_boot" using metadata type lvm2
#lvs -a -o +devices -P
PARTIAL MODE Incomplete volumes will be processed
Couldnt find device with UUID kRwuJT-VGMT-ufth-Jfcz-3m7K-WaC5-c1dEf9
LV VG Attr Lsize <pool, origin, data, move, log all empty> Devices
lv_home vg_boot -wi-----p 5.8t unknown device(5465)
lv_home vg_boot -wi-----p 5.8t unknown device(29808)
lv_home vg_boot -wi-----p 5.8t /dev/md1(0)
lv_opt vg_boot -wi-----p 488.38G unknown device(21994)
lv_root vg_boot -wi-----p 217.25G unknown device(3907)
lv_usr vg_boot -wi-----p 488.38G unknown device(25901)
lv_var vg_boot -wi-----p 488.38G unknown device(0)
That is about all I can stand to transcribe off the screen in front of me. I guess from typing all this out its not inconceivable that the raid array is resynchronising itself and just needs to be left to get on with its thing, but that said I really don't know.
My aim is to get the system booting. as far as I know I have made no changes to any partition tables, backups or data. the only thing that has changed is the order and position of the drives in the SATA ports.
a final point is that depending on which way round the drives are I either get nothing at all, no grub no boot no nothing, OR I get a red message BAD PBR.
There was some debate at the office today as to which was the better situation. I think we kinda concluded the error was better as it suggested that grub was getting up and running and then seeing something it didn't like.
I would be delighted to provide any further information about the machine and the output of any commands you would like run.