LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - General (http://www.linuxquestions.org/questions/linux-general-1/)
-   -   RAID 10 failed - and my installation uses a lvm on top - how can i recover from this? (http://www.linuxquestions.org/questions/linux-general-1/raid-10-failed-and-my-installation-uses-a-lvm-on-top-how-can-i-recover-from-this-872831/)

nicolasdiogo 04-04-2011 05:48 AM

RAID 10 failed - and my installation uses a lvm on top - how can i recover from this?
 
1 Attachment(s)
Hello,

i am using Debian on my system and had a problem over the weekend (not known)
today when trying to reboot it - i get an error message that goes explains that it can not mount md0

there are others messages that says:
md: kicking non-fresh sde2 from array
...
md: kicking non-fresh sdf2 from array
..

please see attachment for dmesg

i have already found this post here:
http://www.linuxquestions.org/questi...-array-416853/

but its suggestion does not work for me.

any idea how i can recover from this.

my system has md0 made up of 6 disks and has lvm on top of it.

i am really out of ideas here - please any suggestions? (note the desperation in my tone!)

thanks,

nicolas

nicolasdiogo 04-04-2011 07:06 AM

i will try to baby-step into this problem.

since my root partition is in LVM and it depends on RAID10 - i can not mount this partition until i have fixed RAID10 to load LVM to then edit the necessary config files.

this is a catch 22..

i suppose i have to use a live distro to 'create' my RAID10 and LVM and then mount its root.

suggestions?

never say never 04-04-2011 12:50 PM

Gotta hate this first thing Monday morning.

This might be of use check out mdadm man pages.
Quote:

--assume-clean
Tell mdadm that the array pre-existed and is known to be clean. It can be useful when trying to recover from a major failure as you can be sure that no data will be affected unless you actually write to the array. It can also be used when creating a RAID1 or RAID10 if you want to avoid the initial resync, however this practice - while normally safe - is not recommended. Use this ony if you really know what you are doing.
However, I would only mount read only for data recovery, then rebuild the array, if it were me.

Also do you know what caused the 2 drives to fail? It appears (guessing here) that sde2 and sdf2 are a Raid1 set that is part of a larger Raid 0 Volume, is that correct?

When designing Raid 10 (or any Raid for that matter) always look at things like the power and SATA controllers for the drives. For Raid 10 set up each Raid 1 set so that each drive of the set is on a different SATA channel, and different power supply.

In your case, assuming your setup is like this:
sda3, sdb3 - Raid 1 (Set 1)
sdc2, sdd2 - Raid 1 (Set 2)
sde2, sdf2 - Raid 1 (Set 3)

With these three Raid 1 sets part of md0; I would make sure that
sda3, sdc2, sde2 are not connected to the same power or SATA controller as their mirror. This helps to limit the possibility of critical failure like you experienced.

Good luck, please let us know if / how you recover.

nicolasdiogo 04-16-2011 06:45 AM

thanks for the suggestions

it was a complete disk disaster.

3 disks failed - after a power surge.
i have bought a new power protection. and i have contacted the company that produced my previous one as there was a 10000 warranty to any products attached to it. but it seems unlikely that i will get anything.

i had to recover from backups after replacing the 1 of the disks that was broken (just would not power up).

but i am sure that i could have recovered this installation - it must have been something that missed while being under pressure to do 'something' that i did something wrong.

could you please suggest me information or a link to which steps i should take to collect sufficient information to reassemble a RAID and LVM PRIOR to a disaster?


thanks,

Nicolas

never say never 05-19-2011 07:17 AM

My best advise is to simulate disasters on non-production systems. Build an array with the same or similar make up (perhaps using older / smaller drives), then simulate various failures. You will then have plenty of time to figure out how to recover, and document how you did it, from each failure. Then if (when) it happens on the production equipment you are prepared. Really no deifferent than restoring a tape backup so that you are familiar with the process before you have to do it under pressure.

Personally, I am in the process of moving all my servers over to VMWare ESXI (the free version). I am creating multiple datastores, backing up the data and images between the stores. In the event of failure, I have to manually move things around and restart servers, but I don't care (within reason) what that equipment is, or where the data is.

trey85stang 05-20-2011 01:22 PM

Quote:

Originally Posted by never say never (Post 4360775)
My best advise is to simulate disasters on non-production systems. Build an array with the same or similar make up (perhaps using older / smaller drives), then simulate various failures. You will then have plenty of time to figure out how to recover, and document how you did it, from each failure. Then if (when) it happens on the production equipment you are prepared. Really no deifferent than restoring a tape backup so that you are familiar with the process before you have to do it under pressure.

Personally, I am in the process of moving all my servers over to VMWare ESXI (the free version). I am creating multiple datastores, backing up the data and images between the stores. In the event of failure, I have to manually move things around and restart servers, but I don't care (within reason) what that equipment is, or where the data is.

good call, all my fancy software/hardware raid setups go through a complete DR tests with documentation before they go into production. Of course this is in a corporate environment but even if you are a home based user you should have a DR plan if you are doing some kind of fancy raiding... Especially if the data is important to you.

I've never worked with software raid 10, so Im not sure how to help... I understand you can lose 3 drives... with a raid10 but it would have to be specific drives. Raid 5 w/ 2 hotspares would of saved you though. Of course you never really plan on losing 3 disks in one shot.


All times are GMT -5. The time now is 08:09 AM.