LinuxQuestions.org
View the Most Wanted LQ Wiki articles.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Distributions > Fedora
User Name
Password
Fedora This forum is for the discussion of the Fedora Project.

Notices

Reply
 
Search this Thread
Old 03-25-2009, 04:29 AM   #1
jot
LQ Newbie
 
Registered: Aug 2004
Location: Singapore
Distribution: Ubuntu and Fedora
Posts: 25

Rep: Reputation: 0
Question FC 10: RAID failure test: grub cannot find boot device


Trying to setup a software RAID on a system with 2 identical SCSI drives (contain /boot) and 2 identical IDE drives (for data). OS version: FC 10, which I can normally boot from harddisk via grub.

The installation was rather smooth, the RAID mirroring appears to work fine, which I have checked with
Code:
mdadm --detail /dev/md0
(there are two more RAID devices: md1 and md2, and all look good).

The Problem:
Now I tried to simulate a harddisk failure by unplugging one of the SCSI drives. The result, I cannot boot at all (with either SCSI unplugged, or even when either IDE drive was unplugged) and get the message "Error 21: Selected disk does not exist" instead. Plugging the disk back in solves the error.

After checking in a number of places, I discovered one odd thing: the mapping of devices from RAID device to actual drive/partion differs from the installation. During the installation things were setup this way:
Code:
md0 --> sdc1/sdd1 (/boot) (SCSI drives)
md1 --> sdc3/sdd3 (/)     (SCSI drives)
md2 --> sda1/sdb1 (/data) (IDE drives)
But after starting FC10 the decribed mapping seems to have changed! Now the above became this instead:
Code:
md0 --> sda1/sdb1 (/boot) (SCSI drives)
md1 --> sda3/sdb3 (/)     (SCSI drives)
md2 --> sdc1/sdd1 (/data) (IDE drives)
Question:
Could this "re-mapping" cause the described grub problem? Is this normal anyway? What could I try?

Any hint/help is appreciated! Thanks
 
Old 03-25-2009, 10:51 AM   #2
mostlyharmless
Senior Member
 
Registered: Jan 2008
Distribution: Slackware -current (multilib) with kernel 3.15.5
Posts: 1,501
Blog Entries: 12

Rep: Reputation: 155Reputation: 155
You have to make sure that grub is installed on both mirrors. I don't know whether FC10 does that for you automatically.

In addition, your bios should be set up to boot from your one mirror as your first boot device and the other mirror as your second boot device. The fact that you saw a change in booting with the IDE drive unplugged indicates the IDE drive is probably higher in the bios boot order than the SATA, which is usually the default.

As far as the mapping changing after booting, that isn't necessarily an issue, but it does mean that when you install grub that you have to make sure you're installing to the correct device. I recommend you use a "native" install from something like SuperGrub rather than using grub-install from your installed system to avoid confusion. Tab completion at the grub> prompt is your friend.
 
Old 03-26-2009, 04:51 AM   #3
jot
LQ Newbie
 
Registered: Aug 2004
Location: Singapore
Distribution: Ubuntu and Fedora
Posts: 25

Original Poster
Rep: Reputation: 0
Lightbulb

Hi Mostlyharmless

Summarizing the good news - RAID recovery after disk failure works now! Thank you!!

Answering your points:
1) Grub was installed on both mirrors. I also don't know if FC 10 does that automatically, I had already done it manually.

2) My system's BIOS could not be set up to boot from one mirror as the first boot device and the other mirror as the second boot device. This is because only the IDE drives were listed as boot order options. The SCSI drives were not offered. So I did not touch this.

3) Regarding the change in drive order after whichever disk was gone: the BIOS' drive order turned out to be an unexpected third version compared to the two reported in my first message:

Reported by Super-Grub:
Code:
hd0 --> sda (SCSI drive)
hd1 --> sdb (IDE drive)
hd2 --> sdc (IDE drive)
hd3 --> sdd (SCSI drive)
If you plugged out whichever SCSI drive it became:
Code:
hd0 --> sda (SCSI drive)
hd1 --> sdb (IDE drive)
hd2 --> sdc (IDE drive)
After you plugged out whichever IDE drive:
Code:
hd0 --> sda (SCSI drive)
hd1 --> sdb (IDE drive)
hd2 --> sdc (SCSI drive)
This revelation was the break-through. By default, the FC 10 installer had configured hd3 as the default boot device. So I had followed that. But obviously, this must fail if one disk is missing considering the findings above.

Solution: set up grub to use hd0 instead of hd3 as boot device (on both mirrors). This did the trick.

4) Installing grub to the correct device: even the commands of the "normal" grub (FC10 had grub 0.97) ensured you hit the right partition. It checked for the boot files and reported an error if one didn't point to the correct partition.

5) Super-Grub saved the day, mainly because it revealed the the "hdx --> sdxx" type of mapping. Good advice!

--
Additional information:
Booting after SCSI disk failure worked without human intervention. However, unplugging one of the IDE drives generated a BIOS error, forcing me to press F1 first in order to boot. But this might just be my particular BIOS.

Once again, thanks!

Jot
 
Old 03-26-2009, 09:11 AM   #4
mostlyharmless
Senior Member
 
Registered: Jan 2008
Distribution: Slackware -current (multilib) with kernel 3.15.5
Posts: 1,501
Blog Entries: 12

Rep: Reputation: 155Reputation: 155
You're welcome, congratulations.
 
  


Reply

Tags
fedora 10, grub, raid1


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Will not boot RAID drive after PS failure. webguyinternet Linux - Server 0 10-04-2006 02:59 PM
UL 1.0 GRUB: could not find device for /boot: not found or not a block device cma Linux - General 4 12-12-2005 03:35 AM
need to access /boot/grub/menu.lst and /boot/grub/device.map neouto Linux - Newbie 8 09-04-2005 11:45 AM
RH9 Software Raid 1 hard drive failure - GRUB loader errors Mynar Linux - Newbie 1 01-28-2004 10:25 AM
Software Raid Setup Ok - Reboot fails on disk failure test ikke Linux - General 2 05-11-2003 06:42 PM


All times are GMT -5. The time now is 11:57 PM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration