Trouble updating grub after copying Linux to a RAID1
I'm trying to learn how to copy a Linux installation from a single disk to a RAID1. Right now what I'm doing is a test, to teach myself how to do it, since when I get to the actual computer that I'll be doing it on I might not have as much time to figure it out. So when you look at the desired end-product of what I'm doing, it may seem a little pointless, but that's just because It's just a test setup. Unfortunately it doesn't quite work.
I'm starting with the Debian installation on the local drive on my laptop (/dev/sda) which I'm copying to an external USB drive (/dev/sdb). I've already successfully copied it to some partitions on /dev/sdb: Quote:
My goal is to essentially do what I've just done in copying the Linux installation, but to copy it to a RAID1 instead of to regular /dev/sdb* partitions. But it seems I can't get grub to work, and it might be because of the RAID1, or it might be something wrong with grub or maybe it's something else. I have little experience with RAID or grub, so it's hard for me to tell. So here's what I do. I have the following partitions and the following planned setup: Quote:
Set up the RAID1 array: Code:
root-prompt# mdadm --create /dev/md0 -n 2 -l 1 /dev/sdb8 /dev/sdb12 Code:
root-prompt# mkfs.ext3 /dev/md0 Code:
root-prompt# mount /dev/md1 /mnt/mountpoint/ Code:
proc /proc proc defaults 0 0 I do update-grub while booted into the non-RAID copy on the external usb drive (/dev/sdb3 mounted on /). This tells me: Code:
root-prompt# update-grub My hope is that when I boot into the external drive, grub will give me the option of booting into the /dev/sdb9 (or /dev/md1) installation. But when I boot into the external drive, it only gives me the option of /dev/sda5 or /dev/sdb3. There's no option for /dev/sdb9 or /dev/md1. If I do update-grub while booted into the laptop's local drive /dev/sda, then it doesn't mention ever finding the installation at /dev/sdb9. It just says: Code:
root-prompt# update-grub Any ideas? |
AAAchh! Too much complexity. It looks like you have a good command of software RAID and copying filesystems but the boot process is getting derailed. Grub is sophisticated but it still is only a glorified bootloader.
Grub 2 is still rather new. If you have the option, stick with legacy grub because it is simpler and you can edit /boot/grub/menu.lst to fix things. You may also find using a separate /boot partition makes things more complicated. Remember, the booting kernel is in its initrd and does not see your fstab until it mounts the / filesystem. Use UUID= instead of /dev/mdx in your fstab because the devices are often named differently by different kernels/udev setups. The UUID you want is the UUID of the filesystem, not the mdadm=created device. blkid will give you the UUID attached to the filesystem when mkfs creates it. You may also get it from ls -l /dev/disk/by-uuid. To simplify your life, when wanting to change something on the copied filesystem, use chroot so that you run the actual software in that file system. It may make no difference when you are using a copy but in general it could be a different version. Another approach to doing such experiments is to use a virtual machine such as VirtualBox. Another approach is to back up the system files, do a bare installation on the actual RAID and then restore to the RAID. Another approach is to mount the copies on your file system. Change /etc/fstab appropriately and reboot. If your present system boots, the copies will then be in place. This avoids having to tinker with the bootloader. Good luck and have fun as a computer brain surgeon... |
I don't know about GRUB2 but GRUB was pretty much incompatible with RAID and needed to be installed on a single partition.
|
Your very nearly there actually.
There are 2 steps missing. You need to rebuild your initramfs image to include /etc/mdadm/mdadm.conf and the mdadm kernel modules, that's just a case of: Code:
# update-initramfs -k all -u https://help.ubuntu.com/community/Grub2 I hate to be contrary, but due to mdadm.conf mdadm device names will always point to the same entity, and because mdadm is compiled as a module you need to explicitly define all the entities in that file, so there is no need to use specify mdadm devices by UUID or LABEL. Also, while grub doesn't understand mdadm, RAID1 is a special case, there is nothing wrong with having /boot on RAID1 PS, just a personal preference - use LABELs rather UUIDs, LABELs can be meaningful, UUIDs are just strings of gibberish |
Quote:
# definitions of existing MD arraysWithout the initramfs rebuild, you could have a system that cannot mount /. If you boot from a partition in a RAID 1, be sure that the boot parameter is ro so that the RAID does not have to resync. All this is automatic on a fresh install but can be a problem for manual creation of RAID or an apt-get dist-upgrade where the kernel and its modules and drivers are likely to change. BTW, software RAID is a wonderful tool of GNU/Linux. Redundancy, of course, is very useful but it also permits the system to seek/transfer several read files at once. It is possible to install the bootloader on each drive of the RAID 1 array so that a failure keeps bootability. You need one entry in the grub menu for each possible boot drive and you need to run the grub-install command for each drive. |
RobertP, I agree whole heartedly with what you just said, but you've just changed scoped (and, I don't think I was particular clear).
mdadm members should be referred to using the UUID rather than listing each device by it's /dev/{h,s}d* name. However in /etc/fstab there is no reason not to use /dev/md*, mdadm entity (md0, md1, etc) names will always refer to the same RAID volume. The mdadm members, ie the physical devices /dev/sda1, dev/sdb1, etc do not have a defined order and as you say the device called /dev/sda might be called something else the next day. |
Thanks you guys for all your helpful comments. I can tell from your ideas that I'll be learning a lot of good stuff as I do this. Plenty of terms there that I've heard before but that refer to things I've never yet dealt with myself.
I hope to let you guys know how it progresses. Oh and happy New Year! |
I just got back to working on this issue.
I installed legacy grub and got rid of grub2 on that drive. I think RobertP might be right that it's better to stick with the legacy grub for the time-being. I have done the "# update-initramfs -k all -u" but haven't been able to get far enough where it would have mattered yet. What I've done is the following: 1) /dev/sdb2 is flagged with the boot flag 2) I have the following entry in grub/menu.lst on /dev/sdb2: Code:
title Debian GNU/Linux, external single drive 3) I also have the following entry in grub/menu.lst on /dev/sdb2: Code:
title Debian GNU/Linux, external array Grub says: Quote:
The files that I mention in the "external array" menu.lst entry exist, and I can verify that by typing (while booted into the "external single drive" installation): Code:
#mount -t ext3 /dev/md0 /mnt/md0/ |
"title Debian GNU/Linux, external array
root (hd1,8) kernel /vmlinuz-2.6.30-2-686 root=/dev/md1 ro initrd /initrd.img-2.6.30-2-686" is telling the kernel to use /dev/md1 for root and it may not have an md driver. Point it at one partition in the RAID 1 array (with ro!!!) and it will be able to load the partition and start md and mount as in /etc/fstab. If you are not getting that far, it probably means grub is seeing the wrong devices. Try looking for your kernel by poking around with grub: "root (hd1,8) Filesystem is ext2fs, partition type 0x83 kernel /vmlinuz-2.6.30-2-686 root=/dev/md1 ro" vary (hd1,8) to be (hd0,8), (hd0,7) or (hd1,7), and see whether the kernel is found. Counting is an inexact science when the BIOS does it one way, grub another and the kernel another... This is an "off by one" error, I suspect. Grub counts from 0 so /dev/sda8 is likely (hd0,7). |
I don't want to make you completely lose focus on the fact that you're trying to install grub, but if you get completely stuck, google "Super Grub Disk" and give that a try. It can usually figure out what grub can see and set up the config file so that it sees it.
|
Thanks Quakeboy02. Grub works now, but that sounds like a useful thing to try. Chances are that will come in handy in the future.
You were right RobertP. I needed to use (hd0,7). The correct entry in menu.lst is: Code:
title Debian GNU/Linux, external array Code:
title Debian GNU/Linux, external single drive I had thought the correspondence between the a,b,c of /dev/sd?? and the 0,1,2 directly following hd in (hd?,?) would not depend on what drive you boot from. But apparently I assumed wrong, and the correspondence does depend on what drive you boot from. Also I had to change Code:
kernel /vmlinuz-2.6.30-2-686 root=/dev/md1 ro Code:
kernel /vmlinuz-2.6.30-2-686 root=/dev/sdb9 ro Quote:
Quote:
With root=/dev/sdb9, it still fails assembling the arrays at the same point. However, it's able to proceed after attaching the external disk, then later on you get: Code:
[ 15.409485] md: md0 stopped. So it seems the array still gets created properly and indeed it seems to be working fine. The warning is not important for my test case since in reality I would do it differently anyway. Then something bad happens. After booting into the array a couple times, it reports "read-only file system" during bootup, and then when doing something that requires writing to filesystem, it fails and hangs. Maybe this was because it altered one of the partitions during bootup before putting together the raid array, but that wouldn't make sense if I'm right in thinking that ro in the menu.lst line means read-only. Anyway, I might try e2fsck'ing the array and trying booting again to see what happens. |
"it reports "read-only file system" during bootup, "
This may indicate some I/O error on the disc. Some distros mount the / filesystem with an error option to switch to readonly. Here's an entry in my fstab: "/dev/sda2 / jfs errors=remount-ro" This options allows one to take corrective action like making a backup while you still can... Check the log to look for I/O errors. The smartmontools package may also help diagnose problems before they kill your system. |
You're right that there's definitely an error occurring with the file system on /dev/md1.
I rebooted into the laptop's local drive and did: Code:
# umount /dev/md0 /dev/md1 /dev/md3 mdstat when booted into the laptop's local drive (but with the external drive plugged in) reports no issues: Code:
$ cat /proc/mdstat Code:
# cat /proc/mdstat Code:
# mdadm --manage /dev/md1 --add /dev/sdb9 Code:
# cat /etc/mtab Then I happen to open gparted and I look at the line corresponding to /dev/sdb9. It says: Code:
Partition File System Mount Point Size Used Unused Flags Out of curiousity, I try booting into /dev/sdb13 instead of /dev/sdb9. I have an entry in menu.lst similar for the one for /dev/sdb9 that allows me to do that. First I need to repair /dev/md1 since it's broken again. I boot into the laptop's local drive and do: Code:
root# umount /dev/md0 /dev/md1 /dev/md3 Again look at /proc/mdstat: Code:
$ cat /proc/mdstat Code:
# mdadm --manage /dev/md1 --add /dev/sdb13 Code:
Partition File System Mount Point Size Used Unused Flags Maybe it's an issue with the fact that I say /dev/sdb9 or /dev/sdb13 in menu.lst but then say /dev/md1 in /etc/fstab. Maybe it mounts / before reading /etc/fstab. But if that were the case, then why would there even be an entry in /etc/fstab for / ? In the real-life sitation that this is going to apply to, I might or might not have the same problem since I plan to work with a pair of regular internal hard drives rather than a USB drive. But even so, it would be interesting to know what exactly's going on. |
Are you running a GUI? It could be that your desktop is grabbing the external USB drive to show an icon on the desktop somewhere... I had forgotten your first post about USB. Booting with USB could be problematic as well because of the order in which md and others start. It could take USB devices longer to get started and they might miss the show.
Some folks have used USB thumbdrives for RAID, like http://linuxgazette.net/151/weiner.html but he did not boot from them and they were all the same. md could have some problems with devices with a variety of speeds. |
Quote:
I'm trying to summarize the clues that I have so far to explain what's going on. I've numbered them below. (1) If I specify to grub to boot using Code:
kernel /vmlinuz-2.6.30-2-686 root=/dev/md1 ro (2) This leaves the option of specifying to grub to boot using either Code:
kernel /vmlinuz-2.6.30-2-686 root=/dev/sdb9 ro Code:
kernel /vmlinuz-2.6.30-2-686 root=/dev/sdb13 ro (3) In either of case (1) or (2) above, /etc/fstab specifies that /dev/md1 is to be mounted at /: Code:
/dev/md1 / ext3 defaults,errors=remount-ro 0 1 Code:
[ 15.401600] md: md3 stopped. So because of this, I don't think it's the GUI since the odd behaviour of /dev/md1 already seems to be appearing before the GUI ever exists. (5) Once I'm booted into the system and logged into my desktop, /etc/mtab says that /dev/md1 is mounted at / and does not mention anything about /dev/sdb13 being mounted: Code:
# cat /etc/mtab (6) According to /proc/mdstat, /dev/sdb13 is not being used: only /dev/sdb9 is being used for /dev/md1: Code:
# cat /proc/mdstat Rembember, this is corresponding to the situation when I selected to boot into the /dev/sdb13 partition from grub. (8) If I try to unmount /dev/sdb13 as root then I get the following: Code:
# umount /dev/sdb13 (10) If I do "lsof /dev/sdb13" at the command line, then I get a whole bunch of stuff including all kinds of things associated with gnome such as gnome-session, gnome-keyring-daemon, nm-applet, gnome-screensaver and so on, plus applications that I've called myself such as firefox, gparted and bash. (11) If I do "lsof /dev/sdb9" at the command line, then nothing gets printed to the screen at all. (12) If I do "lsof /dev/md1" at the command line, then again nothing gets printed to the screen at all. So it seems that /dev/sdb13 is the one that's actually being used, even though /etc/mtab doesn't think that it's mounted. (13) If I boot into something other than this array on the USB drive, like into the laptop's local drive for example, then there's no problem. The array behaves normally. /dev/sdb9 and /dev/sdb13 are both part of /dev/md1 and no error is seen until I do an e2fsck of /dev/md1. Then it tells me there are errors in the filesystem. (14) If I try to reboot into one of the partitions of the array, then it isn't sufficient to just to an e2fsck of /dev/md1. I have to force /dev/md1 to fail, remove /dev/sdb13 from the array, reinsert it, allow the two partitions in the array to resync, then do an e2fsck of /dev/md1, make sure /etc/fstab and perhaps other important files weren't destroyed, and then I can reboot. I've repeated some things that I already mentioned in previous posts, but that's because I'm trying to summarize what I know. As you say RobertP, maybe I'm reaching the limits of what I can do with a USB drive. So chances are good that I won't run into this problem in an actual practical scenario. But there's something interesting about what's going on here. It makes me want to understand why it's working the way it's working, or why it's even working at all for that matter, and I think if I could understand it I'd have the chance of understanding Linux better. An example of some questions on my mind: Q1. Why do /etc/mtab and umount not know that /dev/sdb13 is mounted, but yet gparted knows that /dev/sdb13 is mounted? If /etc/mtab and umount are right and /dev/sdb13 isn't mounted, then how is /dev/sdb13 still being used? If gparted is right and /dev/sdb13 is mounted, then why does /etc/mtab not know this? Did /etc/mtab come too late in the game or something? Q2. How does gparted determine what's mounted and where, since it seems to be aware of something that /etc/mtab doesn't know about? Q3. Am I right in thinking that there's some kind of discrepancy that occurs due to a difference between the partition that /boot/grub/menu.lst specifies as root on the kernel line and the partition that /etc/fstab specifies as being mounted on /? Q4. What else can I do or look at in order to get clues about what's going on? |
All times are GMT -5. The time now is 01:40 PM. |