[SOLVED] Server didn't boot - boot partition on RAID 1
Linux - ServerThis forum is for the discussion of Linux Software used in a server related context.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
My web hosting company installed two RAID:
a mdadm RAID1 with the /boot copied across 4 partitions: sda2 / sdb2 / sdc2 / sdd2
a mdadm RAID 5 accross 4 partitions on sda3 /sdb3:/sdc3/sdd3
sda started to get more and more bad sectors and I created a ticket. My web hosting company hot swapped the hard drive. The two raid have been restored and ended up in clean state.
Then, i rebooted the server and the grub bootloader didn't start. (nada nothing).
So, i ran GRML (rescue debian linux), i went to /etc/fstab and i replaced the UUID mapped to the boot by the second boot partition from the second drive /dev/sdb2 /boot. Still, there were no difference, there were no boot. I reverted it back to original fstab.
So, I tried to rewrite the grub with grub2-install and grub2-mkconfig but there were no difference. It didn't boot
I have asked the web hosting company to reinstall their clean install. Then, i have done a mdadm --fail /dev/md0 /dev/sda2 to simulate once again the boot partition on the failing hard drive. The server rebooted successfully
Then, i have added mdadm --fail /dev/md0 /dev/sda3 to simulate the whole failing hard drive and the server rebooted too.
Then i have changed /etc/fstab once again to /dev/sdb2 /boot and it didn't boot, i guess it was because it was a linux raid member.
I wonder if after the hardrive was replaced were there a problem with initramfs? If somebody has any idea about it. Is it possible that the UUID change of the hard drive had some bad consequences?
Do you think that RAID1 and boot don't get along well and that only hardware RAID is the right way to do it?
Last edited by bloupbloup; 08-02-2017 at 04:04 AM.
On the hosting company side, after it didn't boot, they didn't feel concerned about it even if they provided the settings. It showed that the 4 mirrored boot partitions were totally useless.
Plus, the benefit of the LVM is nowhere to be seen.
I've never used a hosting company ... so value this how you will.
mdadm is for partitions - and is fine for the purpose. Arguably better than hardware RAID for commodity disk. Grub plays with it just fine - if configured properly.
The issue with (BIOS/MBR) disks is that the boot record in the MBR is not handled by the RAID replication. It has to be generated properly for each disk in the RAID1 set. Your provider appears not to have bothered - or tested.
And yes, the initrd also has to have enough smarts to continue booting with a degraded RAID. Not rocket science, but if grub isn't setup properly, the initrd never gets control.
I do use a hosting company, I always used raid1 for the boot partition and the server always booted even after multiple HDD failures.
Maybe I misunderstood something - in any case here are my thoughts:
1)
Quote:
mdadm RAID1 with the /boot copied across 4 partitions: sda2 / sdb2 / sdc2 / sdd2
You would not "copy" boot to those partitions.
You would just create e.g. a "/dev/md2" RAID1 (which uses sda2 / sdb2 / sdc2 / sdd2) and your "/dev/md2" would be mounted on the "/boot" directory => anything that you put in there (e.g. grub.cfg, grub libraries, kernel, initrd, etc...) would be mirrored automatically accross all 4 HDDs.
2)
Quote:
i went to /etc/fstab and i replaced the UUID mapped to the boot by the second boot partition from the second drive...
In "/etc/fstab" you should not point to any HDD partition, but to the RAID1 that you have created and which is mounted on the "/boot" directory - in this example it would point to "/dev/md2".
3)
You would run "grub2-install" 4 times, once against each of the 4 disks (e.g. "grub2-install /dev/sda" + "grub2-install /dev/sdb" + ...) => this way each disk has a MBR and it won't matter which disk is chosen for the boot by the BIOS of the server.
4)
Not 100% sure if this is absolutely necessary, but I have always done it this way: A) compiled the RAID-modules in the kernel (not in initrd) and B) I set the grub config to load the kernel with the parameter "domdadm", which forces the kernel to search & assemble all RAIDs at a very early stage.
Cheers
EDIT:
You have to create at least the raid1 of the boot partition (don't remember if that's needed as well for the root part) using v0.90 metadata.
Last edited by Pearlseattle; 07-28-2017 at 09:51 AM.
If somebody just replaced a drive, i think they should rebuild the grub2 before rebooting.
grub2-install
grub2-mkconfig
and also rebuild the initramfs image
dracut -f
After a reboot it is more complicated from a rescue OS because the arrays have to be mounted and then you should be in chroot.
1/ (continued) check that the GRUB is installed in the MBR. The following command line reads the first sector.
dd bs=512 count=1 if=/dev/sda 2>/dev/null | strings
GRUB should be displayed when you run this command line. if a drive is missing grub, it should be reinstalled using grub2-install /dev/***
2/ Check the /boot/grub2/grub.cfg. In this file, you will find the UUID of the array
compare those UUID with the command line blkid.
in blkid, you will fine the same uuid for the drives in the same array.
in grub.cfg, This uuid should be found after --hint parameter The UUID should match. if it does not match in the grub.cfg, you can modify it because if the UUID is wrong, you server will not boot.
there is a second UUID after --hint is the UUID for the raid array that is mounted on the /boot.
3/ On top of that, in the initramfs image in /boot, there is a /etc/mdadm.conf with 2 UUID s which should match with the /dev/md* arrays using blkid.
you have to unpack the image to a folder to access the mdadm.conf details.
4/ In my case, i can see only 2 explanations why it didn't boot. 1/ GRUB was not in the first sector of the drive and there was no boot sequence in the bios. 2/ a UUID of an array has been modified somewhere after drive replacement.
Last edited by bloupbloup; 07-30-2017 at 07:08 AM.
I have found the solution using a VM and it is really simple.
I have just copied the 512 octets of the second drive (512 octets = the MBR) to the first drive and it booted even when the /boot partition of the first drive was deleted.
dd if=/dev/sdb of=/dev/sda bs=512 count=1
Then, after i rebooted to the system i have typed:
mdadm --add /dev/mdx /dev/sda2 (to re add the boot partition to the array
Then, i have added the drive to RAID 5:
mdadm --add /dev/mdx /dev/sda3
after sync
grub2-install /dev/sda
and
grub2-mkconfig
This shows how it is important to backup your MBR.
I think that many bios of servers have no boot order for all the SATA drives that are installed. The bios tried to boot on the first drive but since there is no Boot information in the MBR after replacing the drive, it does not work.
It is a mistake to modify the FSTAB because the UUID from the FSTAB aren't the UUID from the drives but it is the UUID of the RAID arrays.
Last edited by bloupbloup; 08-02-2017 at 04:07 AM.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.