RAID 10 failed - and my installation uses a lvm on top - how can i recover from this?
i am using Debian on my system and had a problem over the weekend (not known)
today when trying to reboot it - i get an error message that goes explains that it can not mount md0
there are others messages that says:
md: kicking non-fresh sde2 from array
md: kicking non-fresh sdf2 from array
please see attachment for dmesg
i have already found this post here:
but its suggestion does not work for me.
any idea how i can recover from this.
my system has md0 made up of 6 disks and has lvm on top of it.
i am really out of ideas here - please any suggestions? (note the desperation in my tone!)
i will try to baby-step into this problem.
since my root partition is in LVM and it depends on RAID10 - i can not mount this partition until i have fixed RAID10 to load LVM to then edit the necessary config files.
this is a catch 22..
i suppose i have to use a live distro to 'create' my RAID10 and LVM and then mount its root.
Gotta hate this first thing Monday morning.
This might be of use check out mdadm man pages.
Also do you know what caused the 2 drives to fail? It appears (guessing here) that sde2 and sdf2 are a Raid1 set that is part of a larger Raid 0 Volume, is that correct?
When designing Raid 10 (or any Raid for that matter) always look at things like the power and SATA controllers for the drives. For Raid 10 set up each Raid 1 set so that each drive of the set is on a different SATA channel, and different power supply.
In your case, assuming your setup is like this:
sda3, sdb3 - Raid 1 (Set 1)
sdc2, sdd2 - Raid 1 (Set 2)
sde2, sdf2 - Raid 1 (Set 3)
With these three Raid 1 sets part of md0; I would make sure that
sda3, sdc2, sde2 are not connected to the same power or SATA controller as their mirror. This helps to limit the possibility of critical failure like you experienced.
Good luck, please let us know if / how you recover.
thanks for the suggestions
it was a complete disk disaster.
3 disks failed - after a power surge.
i have bought a new power protection. and i have contacted the company that produced my previous one as there was a £10000 warranty to any products attached to it. but it seems unlikely that i will get anything.
i had to recover from backups after replacing the 1 of the disks that was broken (just would not power up).
but i am sure that i could have recovered this installation - it must have been something that missed while being under pressure to do 'something' that i did something wrong.
could you please suggest me information or a link to which steps i should take to collect sufficient information to reassemble a RAID and LVM PRIOR to a disaster?
My best advise is to simulate disasters on non-production systems. Build an array with the same or similar make up (perhaps using older / smaller drives), then simulate various failures. You will then have plenty of time to figure out how to recover, and document how you did it, from each failure. Then if (when) it happens on the production equipment you are prepared. Really no deifferent than restoring a tape backup so that you are familiar with the process before you have to do it under pressure.
Personally, I am in the process of moving all my servers over to VMWare ESXI (the free version). I am creating multiple datastores, backing up the data and images between the stores. In the event of failure, I have to manually move things around and restart servers, but I don't care (within reason) what that equipment is, or where the data is.
I've never worked with software raid 10, so Im not sure how to help... I understand you can lose 3 drives... with a raid10 but it would have to be specific drives. Raid 5 w/ 2 hotspares would of saved you though. Of course you never really plan on losing 3 disks in one shot.
|All times are GMT -5. The time now is 10:51 AM.|