LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Server (http://www.linuxquestions.org/questions/linux-server-73/)
-   -   Boot problem after RAID configuration (http://www.linuxquestions.org/questions/linux-server-73/boot-problem-after-raid-configuration-856836/)

shaze 01-17-2011 10:25 AM

Boot problem after RAID configuration
 
Dear all,

Please could you help with the following.


A few weeks ago, I reconfigured our server to use RAID 1, with LVM on top of that. I followed as best I could the instructions in http://linuxdevcenter.com/pub/a/linu...vm.html?page=2.

Everything worked fine until this morning, when we had to turn the server off to move it. After that it would not boot up again

Linux starts booting until I get a point where it complains about

VolGroup00 not being detected
and then
mount: could not find filestystem '/dev/root'.
Setting up other filesystems
Setting up new root fs
setuproot: moving /dev/ failed: No such file or directory
I can boot up off a LiveCD and then manually mount disks so I don't think it's any physical problem.

I am running Scientific Linux 5.4

My configuration is as follows: VolGroup00/LogGroup00 consists of
/dev/md0: /dev/sda2 + /dev/sdb2
/dev/md1: /dev/sdc1 /dev/sdd1
/dev/md2: /dev/sde1 /dev/sdf1
/dev/sda1 is a Linux partition (83) as is /dev/sdb1 (though not used). The others are all Linux Raid autodetect.

My idea is that /dev/VolgGroup/LogGroup00 should be the root and that the system should boot off /dev/sda1. I'd be quite happy to boot off LogGroup00 but this just how things ended up.

I can see anything in the grub.conf (of /dev/sda1) that contradicts this, or in the initrd. The device.map called hd0 /dev/sda (so presumably /dev/sda1 is (hd0,0)

In /etc/fstab (on LogGroup00), the fstab looks sensible -- / is /dev/VolGroup00/LogGroup00). As I type this, I realise that there isn't an entry for /boot in /etc/fstab. I'll only be able to test this tomorrow otherwise I'll experiment).

I've doen a lot of googling on this and found other people with the problem but I can't find a satisfactory solution.

Other than really not wanting to lose my data on my current disks, I am not tied to the current configuation. I'd be quite happy for my boot disk to be LogGroup00. I'd by happy to rejig /dev/sda1 and /dev/sdb1.

Any help gratefully received!

Thanks

indelible 01-17-2011 02:45 PM

grub doesn't know anything about RAID (I don't think, anyone can feel free to correct me). The easiest solution I know of is to have a separate, small boot partition that doesn't use RAID. So you can at least boot, then mount the volumes.

jlinkels 01-17-2011 02:59 PM

To my understanding GRUB cannot boot from LVM, hence, you need always a small boot partition where the initrd image resides. Only then initrd can load, the LVM driver becomes available and you can access the LVM partition. Same goes for the RAID driver, altough in that case the boot partition is allowed on a RAID1 set. (Not on a RAID5)

Doublecheck whether all your devices (sd*) received the same drive name as during initial configuration. When I have such problems, usually the kernel thinks differently about sd* than during installation. Which is obvious, as while installing I have often a USB plugged in.

jlinkels

shaze 01-17-2011 11:20 PM

Thanks for the replies. My boot partition is /dev/sda1, which is neither managed by LVM nor RAID. My fstab and grub.conf (and anything I can think of is consistent with this).

indelible 01-18-2011 02:36 PM

Are you using grub-legacy or grub-2? (v1.9x is grub 2)

If you're using grub-2, you want to edit /etc/default/grub then run 'sudo update-grub' rather than editing grub.conf manually. However, this is more of an aside and I'm grasping at straws (can you tell?).

One other note that I've had issues with (though to fix it you would lose your data...) is that having the partitions set to "linux raid autodetect" can sometimes cause issues like you're seeing. It was mostly with kernels around a year ago, but it might potentially still exist. Having that and the raid super block written as a version newer than v0.9 caused intermittent problems for me. I don't use lvm, though - I use mdadm directly.

So for my raid partition, I created it like so:
sudo mdadm /dev/md0 --create --auto yes -e 0.90 -l 0 -n 2 /dev/sda4 /dev/sdb4
and my partitions are all type 0x82 (Linux)

I also only have it mounted as /home so it's not quite so critical.

Maybe this will at least point you in the direction of some helpful googling.

shaze 01-19-2011 11:19 PM

Gave up
 
Thanks for the suggestions. In the end, after two days, and getting help, I gave up, re-installed the OS and recovered the system from backup. It all works fine, and the config seems the same. I think the problem was some subtle problem with having an older BIOS, an older kernel. Judging from things I found in the web with people who had similar problems, the solution is doctor the initrd in some way, but given the cycle of doing that, rebooting from live cd, etc is so long and labour intensive I gave up after a few tries. My primary backup failed too, but fortunatelyl I had a secondary backup and the system seems up again. I suppose the take-home lesson for part-time sys admins like me is that hard disks are cheap. Back up everything, several times. It's a lot cheaper than spending a few days with some obscure bug.


All times are GMT -5. The time now is 08:00 PM.