LinuxQuestions.org
Go Job Hunting at the LQ Job Marketplace
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Server
User Name
Password
Linux - Server This forum is for the discussion of Linux Software used in a server related context.

Notices

Reply
 
LinkBack Search this Thread
Old 08-28-2007, 03:23 PM   #1
Vanyel
Member
 
Registered: Jul 2007
Location: NY, NY
Distribution: RHEL, CentOS, FC, Ubuntu
Posts: 111

Rep: Reputation: 17
Question Software RAID issue with a RHEL 4 server


Under RHEL 4, I'm confused over a software RAID issue, but I'll need to give a little detail first.

I have two servers, Larry and Moe.

Each one has 2 disks in a RAID 1 (mirror) configuration.

Someone else was tasked with making Larry and Moe identical, as Moe is just a spare machine. They messed up and Moe was an incomplete copy. It would boot, but didn't function right and things were missing. Then this became my task.

I removed one drive from Larry and one drive from Moe, placed the Moe drive in Larry, and then used Ghost for Linux to clone Larry's good drive to the drive from Moe.

So Larry is fine as ever, and Moe functions perfectly too, on one drive, which thinks it's part of a broken RAID 1. We'll call this Drive A.

MY QUESTION IS - if I put back Moe's other drive (Drive B), which is a member of the previous RAID with the bad installation, how do I make sure Drive A is dominant and wipes out/rebuilds itself onto Drive B? I don't want Drive B to come up on boot and then rebuild it's damaged self onto the good Drive A! Haven't done much with software RAID before and in the past I was always adding a blank drive into the mix, never one that already has System Software and could be a potential "competitor".

Can someone give me some advice on getting this RAID back functioning again?

-Van
 
Old 08-28-2007, 05:56 PM   #2
ajg
Member
 
Registered: Nov 2005
Location: The People's Republic of South Yorkshire
Distribution: FC3, CentOS4&5, Hardy Heron, Mythbuntu
Posts: 62

Rep: Reputation: 15
What does

Code:
cat /proc/mdstat
show on Larry and Moe?
 
Old 08-29-2007, 03:47 PM   #3
Vanyel
Member
 
Registered: Jul 2007
Location: NY, NY
Distribution: RHEL, CentOS, FC, Ubuntu
Posts: 111

Original Poster
Rep: Reputation: 17
Larry

Personalities : [raid1]
md2 : active raid1 sdb5[1] sda5[0]
2048192 blocks [2/2] [UU]

md1 : active raid1 sda6[0]
237633344 blocks [2/1] [U_]

md0 : active raid1 sdb3[1] sda3[0]
200704 blocks [2/2] [UU]

Hey! I didn't realize that failure there. Not sure what that's about. But let's concentrate on Moe.
Drive A is present Drive B is disconnected.

MOE
Personalities : [raid1]
md2 : active raid1 sda5[1]
2048192 blocks [2/1] [_U]

md1 : active raid1 sda6[1]
237633344 blocks [2/1] [_U]

md0 : active raid1 sda3[1]
200704 blocks [2/1] [_U]

unused devices: <none>


- Van
 
Old 08-29-2007, 04:18 PM   #4
ajg
Member
 
Registered: Nov 2005
Location: The People's Republic of South Yorkshire
Distribution: FC3, CentOS4&5, Hardy Heron, Mythbuntu
Posts: 62

Rep: Reputation: 15
OK, that failed partition on Larry is interesting, but we can come to that later.

Moe has 3 RAID partitions on /dev/sda. Is it SATA or SCSI? It looks like SATA, and this can make a difference in the drive ordering - if you removed what was /dev/sda, then what was /dev/sdb is now /dev/sda. If you put the old drive back in, that will now be /dev/sda, and the drive you want to keep will be /dev/sdb - this gets really confusing.

I see sda3, sda5 and sda6 as part of the mirror sets - are sda1, sda2 and sda4 unmirrored or something else?

I really want to be sure of where I am before I give you any advice and instructions. A copy of the partition table from fdisk would be handy!

Code:
fdisk /dev/sda
p
q
you need to be root to see the device.
 
Old 08-30-2007, 09:30 AM   #5
Vanyel
Member
 
Registered: Jul 2007
Location: NY, NY
Distribution: RHEL, CentOS, FC, Ubuntu
Posts: 111

Original Poster
Rep: Reputation: 17
No problem. I'm root.

These are SATA drives, btw.

You can see sda1, sda2 and sda4 in the fdisk output, below.

THANK YOU for your help do far!

[van@<machine> ~]$ sudo fdisk /dev/sda
Password:

The number of cylinders for this disk is set to 30394.
There is nothing wrong with that, but this is larger than 1024,
and could in certain setups cause problems with:
1) software that runs at boot time (e.g., old versions of LILO)
2) booting and partitioning software from other OSs
(e.g., DOS FDISK, OS/2 FDISK)

Command (m for help): p

Disk /dev/sda: 250.0 GB, 250000000000 bytes
255 heads, 63 sectors/track, 30394 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Device Boot Start End Blocks Id System
/dev/sda1 * 1 7 56196 de Dell Utility
/dev/sda2 8 530 4200997+ c W95 FAT32 (LBA)
/dev/sda3 531 555 200812+ fd Linux raid autodetect
/dev/sda4 556 30394 239681767+ 5 Extended
/dev/sda5 556 810 2048256 fd Linux raid autodetect
/dev/sda6 811 30394 237633448+ fd Linux raid autodetect

Last edited by Vanyel; 08-30-2007 at 09:31 AM. Reason: Correct mistake
 
Old 08-31-2007, 06:41 AM   #6
ajg
Member
 
Registered: Nov 2005
Location: The People's Republic of South Yorkshire
Distribution: FC3, CentOS4&5, Hardy Heron, Mythbuntu
Posts: 62

Rep: Reputation: 15
Post

Right. Do you know which SATA port the drive you have running is installed on? It needs to be on the first port or we'll end up getting confused when we add the other drive back in.

I would have preferred to wipe the drive we're putting back in completely, but I guess given that it's full of Dell system partitions (this makes me suspect it was on port 1 of the SATA controller initially) that's not an option, so we'll have to hope that everything goes by the book.

So ... what I would do:

1) Take a backup. There is a small chance that this process could go horribly, catastrophically wrong.

2) Make sure the existing drive is on the first SATA port in the system.

3) If you have to change it over, boot the system and do a
Code:
cat /proc/mdstat
to make sure it all looks good (nothing should change from when you last looked at it).

4) Install the second drive on the second SATA controller. For this process to work following my instructions, Linux has to see it as /dev/sdb. Things will go horribly wrong if it isn't.

5) Boot the system and do a
Code:
cat /proc/mdstat
If things are going by-the-book, it should show that all the /dev/sdaX volumes are up, and the /dev/sdbX are still down, so we need to add them back into the array. It may figure it out and try to remirror things by itself - the mdstat will tell you remirroring progress, but this has never happened in my experience. If it does, you'll have to wait for it to finish, then verify your data. If there's anything wrong, go for your backup. If by some miracle it remirrors automatically with no problems, then you're done. I strongly suspect this won't be the case, and you'll have to tell it to remirror though.

6) So ... if all is looking good, do:
Code:
mdadm /dev/md0 --add /dev/sdb3
mdadm /dev/md2 --add /dev/sdb5
mdadm /dev/md1 --add /dev/sdb6
keep checking
Code:
cat /proc/mdstat
to verify progress of the remirroring. You can also do this on Larry with the /dev/md1 to try and mirror that back up too:
Code:
mdadm /dev/md1 --add /dev/sdb6
If you're not sure about anything, or something is unclear, come back to me before leaping in with this! I cannot stress how horribly things can go wrong when mucking around with RAID sets!
 
Old 09-04-2007, 12:33 PM   #7
strick1226
Member
 
Registered: Feb 2005
Distribution: CentOS, Fedora, OS X, SLES, Ubuntu
Posts: 273

Rep: Reputation: 51
Great advice. Only thing I can add is the following:

watch -n x cat /proc/mdstat
(where x= number of seconds between updates)


If you're sitting at a terminal and plan to watch it finish, this is the way to go.

Good luck!
 
Old 09-04-2007, 03:02 PM   #8
Vanyel
Member
 
Registered: Jul 2007
Location: NY, NY
Distribution: RHEL, CentOS, FC, Ubuntu
Posts: 111

Original Poster
Rep: Reputation: 17
Strick - thanks for the Watch command! I'd never heard of it. Good tool!

AJG - Thanks for ALL your help so far!!! So here's how it went -

After getting some hardware advice from Dell on how to tell which drive should be dominant on reboot (which turned out to be WRONG!) I finally got sick of it and just plugged in Moe B. In the end, Moe A/B is only a copy of Larry A/B anyway, so I could always go back to the source.

No matter WHICH hardware SATA connection the drives were plugged into, Moe B (the Bad drive) was always dominant! It was however, more messed up than I remembered and never really booted, so Moe A didn't get harmed.

I then remembered SATA *is* hot-pluggable, so I booted up with power and SATA connected to Moe A and only power connected to Moe B. Good drive came up as sda. Then logged in, I plugged in Moe B's sata cable and it became sdb.

From there ajg, I just followed your instructions and the remirroring seems to be coming along fine! I'll let you know how it finishes!

- Van
 
Old 09-04-2007, 04:42 PM   #9
Vanyel
Member
 
Registered: Jul 2007
Location: NY, NY
Distribution: RHEL, CentOS, FC, Ubuntu
Posts: 111

Original Poster
Rep: Reputation: 17
Hmmm ... It's done and everything seems fine, except

mdadm /dev/md0 --add /dev/sdb3

doesn't stick. After issuing the command seeing a quick recovery process, After I reboot, I get

[van@<machine> ~]$ cat /proc/mdstat
Personalities : [raid1]
md2 : active raid1 sdb5[0] sda5[1]
2048192 blocks [2/2] [UU]

md1 : active raid1 sdb6[0] sda6[1]
237633344 blocks [2/2] [UU]

md0 : active raid1 sda3[1]
200704 blocks [2/1] [_U]

unused devices: <none>


Why does one half of md0 not come back after reboot?
 
Old 09-07-2007, 04:00 AM   #10
ajg
Member
 
Registered: Nov 2005
Location: The People's Republic of South Yorkshire
Distribution: FC3, CentOS4&5, Hardy Heron, Mythbuntu
Posts: 62

Rep: Reputation: 15
A good question, and one that I've never been able to get to the bottom of. It may be something to do with failed blocks on the drive you are trying to mirror to - it's possible that it no longer has enough good blocks to mirror the whole data set. I have one like this, but it's not a production server so I've never bothered to find out why. Could be worth having a look with mdadm to see if this is the case.
 
  


Reply

Tags
raid1


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
software raid issue linuxgeek2940 Debian 4 05-02-2007 04:05 PM
Some advice with software raid on an old server? Guttorm Debian 2 07-11-2006 04:38 AM
video streaming server software for RHEL? spayeur Linux - Enterprise 0 08-30-2005 06:21 PM
Software Raid Issue jhotchkiss Linux - Software 1 09-16-2003 08:09 PM
server crashes with software raid Nerun Linux - Software 2 01-24-2003 10:45 AM


All times are GMT -5. The time now is 01:06 AM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration