Linux - ServerThis forum is for the discussion of Linux Software used in a server related context.
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Under RHEL 4, I'm confused over a software RAID issue, but I'll need to give a little detail first.
I have two servers, Larry and Moe.
Each one has 2 disks in a RAID 1 (mirror) configuration.
Someone else was tasked with making Larry and Moe identical, as Moe is just a spare machine. They messed up and Moe was an incomplete copy. It would boot, but didn't function right and things were missing. Then this became my task.
I removed one drive from Larry and one drive from Moe, placed the Moe drive in Larry, and then used Ghost for Linux to clone Larry's good drive to the drive from Moe.
So Larry is fine as ever, and Moe functions perfectly too, on one drive, which thinks it's part of a broken RAID 1. We'll call this Drive A.
MY QUESTION IS - if I put back Moe's other drive (Drive B), which is a member of the previous RAID with the bad installation, how do I make sure Drive A is dominant and wipes out/rebuilds itself onto Drive B? I don't want Drive B to come up on boot and then rebuild it's damaged self onto the good Drive A! Haven't done much with software RAID before and in the past I was always adding a blank drive into the mix, never one that already has System Software and could be a potential "competitor".
Can someone give me some advice on getting this RAID back functioning again?
OK, that failed partition on Larry is interesting, but we can come to that later.
Moe has 3 RAID partitions on /dev/sda. Is it SATA or SCSI? It looks like SATA, and this can make a difference in the drive ordering - if you removed what was /dev/sda, then what was /dev/sdb is now /dev/sda. If you put the old drive back in, that will now be /dev/sda, and the drive you want to keep will be /dev/sdb - this gets really confusing.
I see sda3, sda5 and sda6 as part of the mirror sets - are sda1, sda2 and sda4 unmirrored or something else?
I really want to be sure of where I am before I give you any advice and instructions. A copy of the partition table from fdisk would be handy!
You can see sda1, sda2 and sda4 in the fdisk output, below.
THANK YOU for your help do far!
[van@<machine> ~]$ sudo fdisk /dev/sda
The number of cylinders for this disk is set to 30394.
There is nothing wrong with that, but this is larger than 1024,
and could in certain setups cause problems with:
1) software that runs at boot time (e.g., old versions of LILO)
2) booting and partitioning software from other OSs
(e.g., DOS FDISK, OS/2 FDISK)
Command (m for help): p
Disk /dev/sda: 250.0 GB, 250000000000 bytes
255 heads, 63 sectors/track, 30394 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Device Boot Start End Blocks Id System
/dev/sda1 * 1 7 56196 de Dell Utility
/dev/sda2 8 530 4200997+ c W95 FAT32 (LBA)
/dev/sda3 531 555 200812+ fd Linux raid autodetect
/dev/sda4 556 30394 239681767+ 5 Extended
/dev/sda5 556 810 2048256 fd Linux raid autodetect
/dev/sda6 811 30394 237633448+ fd Linux raid autodetect
Last edited by Vanyel; 08-30-2007 at 09:31 AM.
Reason: Correct mistake
Right. Do you know which SATA port the drive you have running is installed on? It needs to be on the first port or we'll end up getting confused when we add the other drive back in.
I would have preferred to wipe the drive we're putting back in completely, but I guess given that it's full of Dell system partitions (this makes me suspect it was on port 1 of the SATA controller initially) that's not an option, so we'll have to hope that everything goes by the book.
So ... what I would do:
1) Take a backup. There is a small chance that this process could go horribly, catastrophically wrong.
2) Make sure the existing drive is on the first SATA port in the system.
3) If you have to change it over, boot the system and do a
to make sure it all looks good (nothing should change from when you last looked at it).
4) Install the second drive on the second SATA controller. For this process to work following my instructions, Linux has to see it as /dev/sdb. Things will go horribly wrong if it isn't.
5) Boot the system and do a
If things are going by-the-book, it should show that all the /dev/sdaX volumes are up, and the /dev/sdbX are still down, so we need to add them back into the array. It may figure it out and try to remirror things by itself - the mdstat will tell you remirroring progress, but this has never happened in my experience. If it does, you'll have to wait for it to finish, then verify your data. If there's anything wrong, go for your backup. If by some miracle it remirrors automatically with no problems, then you're done. I strongly suspect this won't be the case, and you'll have to tell it to remirror though.
Strick - thanks for the Watch command! I'd never heard of it. Good tool!
AJG - Thanks for ALL your help so far!!! So here's how it went -
After getting some hardware advice from Dell on how to tell which drive should be dominant on reboot (which turned out to be WRONG!) I finally got sick of it and just plugged in Moe B. In the end, Moe A/B is only a copy of Larry A/B anyway, so I could always go back to the source.
No matter WHICH hardware SATA connection the drives were plugged into, Moe B (the Bad drive) was always dominant! It was however, more messed up than I remembered and never really booted, so Moe A didn't get harmed.
I then remembered SATA *is* hot-pluggable, so I booted up with power and SATA connected to Moe A and only power connected to Moe B. Good drive came up as sda. Then logged in, I plugged in Moe B's sata cable and it became sdb.
From there ajg, I just followed your instructions and the remirroring seems to be coming along fine! I'll let you know how it finishes!
A good question, and one that I've never been able to get to the bottom of. It may be something to do with failed blocks on the drive you are trying to mirror to - it's possible that it no longer has enough good blocks to mirror the whole data set. I have one like this, but it's not a production server so I've never bothered to find out why. Could be worth having a look with mdadm to see if this is the case.