LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Slackware (http://www.linuxquestions.org/questions/slackware-14/)
-   -   Slackware 12.1 + RAID 1 + LVM: Can't boot with or without encrypted filesystem (http://www.linuxquestions.org/questions/slackware-14/slackware-12-1-raid-1-lvm-cant-boot-with-or-without-encrypted-filesystem-643068/)

gargamel 05-18-2008 12:12 PM

Slackware 12.1 + RAID 1 + LVM: Can't boot with or without encrypted filesystem
 
Hi everybody,

as it is a rainy Sunday afternoon I decided to try something more 'challenging', and, of course, ran into trouble: My machine doesn't boot anymore. Seems I got what I was asking for... 8-(

Partitioning and RAID-1
I am trying to install Slackware 12.1 on a system with two identical harddiscs. Each disc has three partitions, one for swap, one for /boot and one for everything else. I setup two RAID-1 arrays: /dev/md0 is for /boot, and /dev/md1 for the rest. swap is not part of a RAID array.

Partitioning:

Code:

hda1        Linux swap              2GB
hda2        Linux raid autodetect    128MB
hda3 Boot  Linux raid autodetect    78GB

Code:

hdb1        Linux swap              2GB
hdb2        Linux raid autodetect    128MB
hdb3 Boot  Linux raid autodetect    78GB

RAID-1:

Code:

# mdadm --create /dev/md0 --level=1 --raid-devices=2 /dev/hd[ab]2
# mdadm --create /dev/md1 --level=1 --raid-devices=2 /dev/hd[ab]3

Encryption with LUKS
So far, so good. Then I continued setting up LVM and harddisc encryption following the chapter Combining LUKS and LVM in README_CRYPT.TXT (on Slackware 12.1 CD 1 or, eg, here: http://ftp.gwdg.de/pub/linux/slackwa...EADME_RAID.TXT).

Code:

# dd if=/dev/urandom of=/dev/md1
# cryptsetup -s 256 -y luksFormat /dev/md1
# cryptsetup luksOpen /dev/md1 slackluks

Note, that I applied encryption to the RAID device /dev/md1, not to an ordinary harddisc partition. Although the first step, filling the file system with random content using dd took several hours, this should be correct, see eg: http://www.saout.de/tikiwiki/tiki-in...RootCryptoraid (although the author uses shred instead of dd).

LVM
I went on, again following README_CRYPT.TXT:
Code:

# pvcreate /dev/mapper/slackluks
# vgcreate -s 32M cryptvg /dev/mapper/slackluks
# lvcreate -L 8G -n root cryptvg
# lvcreate -L 30G -n home cryptvg

I did not create a logical volume for swap, as prefer not to put swap under LVM control.
Code:

# vgscan --mknodes
# vgchange -ay

Again I skipped the mkswap step, because it isn't necessary, when swap is not under LVM control.
Then I ran setup, and selected the mountpoints for /, /boot and /home:
Code:

/dev/cryptvg/root  /
/dev/cryptvg/home  /home
/dev/md0          /boot

[EDIT] Corrected a typo: /boot is /dev/md0, not /dev/md1. [/EDIT]

Note again, that I selected /dev/md1 instead of /dev/hda2 or /dev/hdb2 for /boot. The rest of setup went smooth, like usual. Only the paragraph about liloconfig in README_CRYPT.TXT is a bit confusing:
Quote:

Choose "expert lilo configuration" with the
option "Install to Master Boot Record (MBR)". Select '/dev/cryptvg/root' as
the root partition to boot.
Well, this is not possible, there is no option to select or specify the root partition, here, in expert mode. It is available in simple mode to choose installation to MBR, and if you do, the correct partition is selected, anyway, so I guess this is a typo in the text. Now, as I was trapped here, I skipped this step, completed the remaining installation and configuration steps and then came back to the liloconfig step. The second turn was successful, LILO was installed to MBRs of both RAID-1 discs, according to the screen messages (if I interpret them correctly). To my knowledge this is one advantage of LILO over GRUB, BTW: On RAID-1 systems it's copied automatically to all mirroring discs.

Generic kernel and initial RAM disk
Finally the installation completed. I selected EXIT and continued creating an initrd in a change root environment, as described in README_CRYPT.TXT.
Code:

# chroot /mnt
# mkinitrd -c -k 2.6.24.5-smp -m ext3 -f ext3 -r /dev/cryptvg/root -C /dev/md1 -L -R -l de-latin1-nodeadkeys

Here you see a few differences in my command compared to the one specified in README_CRYPT.TXT. First of all, I again replaced the ordinary harddisc device with the RAID device name /dev/md1. Secondly, I added a couple of options. The first option is -R for RAID support. This is recommended in the README_RAID.TXT. The second option is for German keyboard support for entering keywords: -i de-latin1-nodeadkeys.
Also following the instructions in README_RAID.TXT I decided to switch to the generic kernel by redefining the relevant symlinks in /boot instead of replacing the symlink /boot/vmlinuz in /etc/lilo.conf with a filename.
Code:

# cd /boot
# ln -sf vmlinuz-generic-smp-2.6.24.5-smp vmlinuz
# ln -sf System.map-generic-smp-2.6.24.5-smp System.map
# ln -sf config-generic-smp-2.6.24.5-smp config

I did, however, NOT edit /etc/mkinitrd.conf, as I added the relevant options to the mkinitrd command line I used above.

LILO
Finally I modified /etc/lilo.conf, ran lilo and rebooted.
/etc/lilo.conf (only relevant, added or modified lines shown):
Code:

boot = /dev/md0
raid-extra-boot = mbr-only
image = /boot/vmlinuz
  initrd = /boot/initrd.gz
  root = /dev/cryptvg/root
  label = linux
  read-only

After writing the file to disc, I issued:
Code:

# lilo
I saw exactly the messages mentioned in README_CRYPT.TXT, and rebooted the system.

The problem
On reboot I was in fact asked for a passphrase as expected. I entered the keyword specified above with
Code:

# cryptsetup -s 256 -y luksFormat /dev/md1
.
So far everything looked fine. The RAID system was working. But there were messages that no volume groups were found, before I was prompted to enter my passphrase. After entering the passphrase, I saw the same messages as described in another thread (http://www.linuxquestions.org/questi...-lvm-642609/):
Code:

raid1: raid set md1 active with 2 out of 2 mirrors
mdadm: /dev/md1 has been started with 2 drives.
  Reading all physical volumes. This may take a while...
md: resync of RAID array md1
md: minimum _guaranteed_ speed: 1000 KB/sec/disk.
md: using maximum available idle IO bandwidth [...]
md: using 128k window, over a total of 76019456 blocks.
  No volume groups found
  No volume groups found
  No volume groups found
Unlocking LUKS crypt volume ´/dev/cryptvg/root´ on device ´/dev/md1´:
Enter LUKS passphrase:

After entering my passphrase I got:
Code:

key slot 0 unlocked.
Command failed: dm_task_set_name: Device /dev/cryptvg/root not found
mount: mounting /dev/mapper//dev/cryptvg/root on /mnt failed: No such file or directory
ERROR:  No /sbin/init found on rootdev (or not mounted). Trouble ahead.
        You can try to fix it. Type ´exit´ when things are done.

/bin/sh: can´t access tty; job control turned off
/ $

Here I wonder about the system trying to mount /dev/mapper//dev/cryptvg/root on /mnt. Firstly: Shouldn't this be mounted to /? Secondly: Does the double slash // in the path name indicate a problem?
Then I entered ´exit´ as suggested by on of the last messages, although I don't know, if and "when things are done". What does this mean?
Anyhow, I got:
Code:

/ $ exit
initrd.gz:  exiting
switch_root: bad newroot /mnt
Kernel panic - not syncing: Attempted to kill init!

Now, the keyboard LEDs are flashing and the computer only reacts on a complete reset.

Failed solution approaches
I really have no clue, what I am doing wrong here. I'd be grateful for any hint. Alien Bob's analysis in http://www.linuxquestions.org/questi...nd-lvm-642609/ may be correct. But the question then is: What's the cause, and how can I fix it? Of course, I followed his advice, rebooted from the installation DVD and tried:

Code:

# mdadm --detail /dev/md0
mdadm: md device /dev/md0 does not appear to be active
# mdadm --detail /dev/md1
mdadm: md device /dev/md1 does not appear to be active

Of course, no physical volumes, volume groups or logical volumes were seen by the system, now. So I recreated the RAID arrays:

Code:

# mdadm --create /dev/md0 --level=1 --raid-devices=2 /dev/hd[ab]2
# mdadm --create /dev/md1 --level=1 --raid-devices=2 /dev/hd[ab]3

At first, no traces of LVM were detected. But that was apparently due to the fact that the RAID array needed to be sync'ed again, which took quite a long time. However, when both arrays were finally active (Rebuild status: 100% complete) again, I got:
Code:

# vgscan --mknodes
 Reading all physical volumes. This may take a while...
 No volume groups found
 No volume groups found
# vgchange -ay
 No volume groups found

Code:

# pvdisplay -c
# vgdisplay -c
# lvdisplay -c

The output of the last three commands was empty, unsurprisingly.

But then I issued this command:
Code:

# cryptsetup luksOpen /dev/md1 slackluks
And finally, pvscan finds the defined physical volume:
Code:

# pvscan
 PV /dev/mapper/slackluks  VG cryptvg  lvm2 [72.50 GB / 34.50 GB free]
 Total: 1 [72.50 GB] / in use: 1 [72.50 GB] / in no VG: 0 [0  ]

As it seems, the volume group cryptvg and the logical volumes /dev/cryptvg/root and /dev/cryptvg/home are also there, but inactive.

My guess is that this is the case at boot time, too. The question is: Why?

While I can activate the LVs on the VG cryptvg easily and re-iterate the installation process after booting from DVD, I have no idea, how I can activate LVs at boot time, and why this is necessary. (To be honest: It is, as yet, only my un-verified guess, that this is the problem, at all).

Thanks a lot for any clue, what's wrong in the above procedure!

Best regards,

gargamel

gargamel 05-18-2008 12:29 PM

Addition: Just looked into /boot/initrd-tree on /mnt (chroot) and found that in load_kernel_modules there is no trace of crypt or LVM or RAID support. Only ext3 support is there. Does this mean that the required modules are not loaded at boot time?

gargamel

gargamel 05-18-2008 01:05 PM

Another addition: I just checked the man page of mkinitrd. While I found the options -C and -L documented, there is no trace of -R for RAID support. This surprises me, because: http://pumpump.blogspot.com/2008/05/...1-generic.html

Hmmm. Does this mean that I have to modify /etc/mkinitrd.conf, as suggested in README_RAID.TXT, and that -R isn't equivalent to that?

gargamel

Alien Bob 05-18-2008 01:52 PM

Quote:

Originally Posted by gargamel (Post 3157099)
Addition: Just looked into /boot/initrd-tree on /mnt (chroot) and found that in load_kernel_modules there is no trace of crypt or LVM or RAID support. Only ext3 support is there. Does this mean that the required modules are not loaded at boot time?

The drivers required for device mapping and disk encryption must be compiled into the kernel or else it won't work. That is why you do not see modules for them in /boot/initrd-tree. The script /boot/initrd-tree/init does all the hard work of setting up the system.

Eric

Alien Bob 05-18-2008 01:57 PM

Quote:

Originally Posted by gargamel (Post 3157116)
Another addition: I just checked the man page of mkinitrd. While I found the options -C and -L documented, there is no trace of -R for RAID support. This surprises me, because: http://pumpump.blogspot.com/2008/05/...1-generic.html

Hmmm. Does this mean that I have to modify /etc/mkinitrd.conf, as suggested in README_RAID.TXT, and that -R isn't equivalent to that?

gargamel

On Slackware 12.1, man mkinitrd gives
Code:

-R    This  option  adds RAID support to the initrd,
      if a static mdadm binary is available on the system.

What are you running I wonder.

Eric

gargamel 05-18-2008 02:04 PM

Quote:

Originally Posted by Alien Bob (Post 3157151)
On Slackware 12.1, man mkinitrd gives
Code:

-R    This  option  adds RAID support to the initrd,
      if a static mdadm binary is available on the system.

What are you running I wonder.

Eric

Ah, thanks, I am trying to install 12.1, but read the man page on a second machine with 12.0. Seems, this is a new option, then.

gargamel

Alien Bob 05-18-2008 03:00 PM

And this is where I think the problem lies (unless it was a typo):
Code:

# mkinitrd -c -k 2.6.24.5-smp -m ext3 -f ext3 -r /dev/cryptvg/root -C /dev/md1 -L -R -l de-latin1-nodeadkeys
The "-C /dev/md1" is wrong. The argument to the -C switch must be the device where you created the LUKS volume. After reading your story, I am certain that /dev/md0 is your RAID volume on top of which you created the LUKS volume. So, the correct mkinitrd command should have been:
Code:

# mkinitrd -c -k 2.6.24.5-smp -m ext3 -f ext3 -r /dev/cryptvg/root -C /dev/md0 -L -R -l de-latin1-nodeadkeys
Eric

gargamel 05-18-2008 03:11 PM

Quote:

Originally Posted by Alien Bob (Post 3157145)
The drivers required for device mapping and disk encryption must be compiled into the kernel or else it won't work. That is why you do not see modules for them in /boot/initrd-tree. The script /boot/initrd-tree/init does all the hard work of setting up the system.

Eric

Ok, then this is correct, as kernel vmlinuz-generic-smp-2.6.24.5-smp has support for both, according to /boot/config-generic-smp-2.6.24.5-smp. I guess the relevant options are:

CONFIG_CRYPTO=y
CONFIG_DM_CRYPT=y

Also, none of README_CRYPT.TXT, README_LVM.TXT and README_RAID.TXT suggests to compile a new kernel. So I would expect that the generic kernel has all that is needed to support disk encryption, LVM and RAID.

Right?

But what's causing my problem, then? Again, thanks for any hint pointing me in the right direction!

gargamel

gargamel 05-18-2008 03:19 PM

Quote:

Originally Posted by Alien Bob (Post 3157199)
And this is where I think the problem lies (unless it was a typo):
Code:

# mkinitrd -c -k 2.6.24.5-smp -m ext3 -f ext3 -r /dev/cryptvg/root -C /dev/md1 -L -R -l de-latin1-nodeadkeys
The "-C /dev/md1" is wrong. The argument to the -C switch must be the device where you created the LUKS volume. After reading your story, I am certain that /dev/md0 is your RAID volume on top of which you created the LUKS volume. So, the correct mkinitrd command should have been:
Code:

# mkinitrd -c -k 2.6.24.5-smp -m ext3 -f ext3 -r /dev/cryptvg/root -C /dev/md0 -L -R -l de-latin1-nodeadkeys
Eric

No, this is almost certainly not the case. /dev/md0 is for /boot, /dev/md1 contains the LUKS volume.

I did:
Code:

# cryptsetup -s 256 -y luksFormat /dev/md1
# cryptsetup luksOpen /dev/md1 slackluks

So the LUKS volume has been created on /dev/md1. I can also verify this by booting with the Slackware 12.1 DVD, logging in to the installer and rebuilding the RAID arrays. After that, no volume groups and such are visible. But after re-issuing

Code:

# cryptsetup luksOpen /dev/md1 slackluks
and entering the passphrase the volume group cryptvg is visible again. I can then mount the logical volumes to mountpoints under /mnt and chroot to /mnt, like with the original installation.

So mixing /dev/md0 with /dev/md1, as you suggest, is not the problem, it seems. Thanks for any further ideas.

gargamel

Alien Bob 05-18-2008 03:54 PM

Yeah I had read incorrectly - /dev/md1 is indeed where you created the LUKS volume. So, the problem is a strange one, because I tested your exact same configuration of RAID, LUKS and LVM in order to test the README_RAID.TXT after it was added to the tree, and did not have any problem.

Eric

gargamel 05-18-2008 04:07 PM

Quote:

Originally Posted by Alien Bob (Post 3157241)
Yeah I had read incorrectly - /dev/md1 is indeed where you created the LUKS volume. So, the problem is a strange one, because I tested your exact same configuration of RAID, LUKS and LVM in order to test the README_RAID.TXT after it was added to the tree, and did not have any problem.

Eric

So, you really tried LUKS on LVM on RAID-1, exactly in that order?
I am asking, because
(1) It seems, that the problem can occur similarly in simpler scenarios, as well: http://www.linuxquestions.org/questi...nd-lvm-642609/
(2) Could it be, that some script involved has been changed shortly before the release of Slackware 12.1, but after you created README_RAID.TXT?

My guess would be, that either somewhere the order in which things are done is wrong, because everything needed seems to be available. Otherwise I couldn't get things going after booting from the DVD and chroot /mnt, right?

Something is just not used in the right way or order on normal boot, it seems... Would you say that my /etc/lilo.conf is correct?

Any further suggestions highly welcome!

gargamel

gargamel 05-19-2008 03:02 PM

Hi Eric, and all others potentially following this thread,

I just tried what GazL suggested here: http://www.linuxquestions.org/questi...45#post3157745
However, I end up with the very same error message.

Code:

raid1: raid set md1 active with 2 out of 2 mirrors
mdadm: /dev/md1 has been started with 2 drives.
  Reading all physical volumes. This may take a while...
md: resync of RAID array md1
md: minimum _guaranteed_ speed: 1000 KB/sec/disk.
md: using maximum available idle IO bandwidth [...]
md: using 128k window, over a total of 76019456 blocks.
  No volume groups found
  No volume groups found
  No volume groups found
Unlocking LUKS crypt volume ´/dev/cryptvg/root´ on device ´/dev/md1´:
Enter LUKS passphrase:
key slot 0 unlocked.
Command failed: dm_task_set_name: Device /dev/cryptvg/root not found
mount: mounting /dev/mapper//dev/cryptvg/root on /mnt failed: No such file or directory

The question is: Why doesn dm_task_set_name not find device /dev/cryptvg/root after /dev/md1 is unlocked? My guess: Because it is inactive, and there's no automatic
Code:

# vgchange -ay
after unlocking. But how can I verify and fix this, then?
Any ideas or suggestions how I could track the problem further down?

Thanks again!

gargamel

gargamel 05-19-2008 05:32 PM

Just to let you know: Yet another attempt failed. Things are even getting worse...

This time I tried a fresh install, did the steps described in my original post, but did not create an initrd. Instead I tried to use the huge-smp kernel. This kernel has, AFAIK, support for device mapper and harddisc encryption and RAID. So it shouldn't be necessary to create an initial ramdisk.

However, I got:
Code:

md: using 128k window, over a total of 75874880 blocks.
raid1: raid set md0 active with 2 out of 2 mirrors
md: ... autorun DONE.
VFS: Cannot open root device "fd01" or unknown-block(253,1)
Please append a correct "root=" boot option; here are the available partitions:
0300  78150744 hda driver: ide-disk
  0301  2000061 hda1
  0302    128520 hda2
  0303  75874995 hda3
0340  78150744 hda driver: ide-disk
  0341  2000061 hda1
  0342    128520 hda2
  0343  75874995 hda3
1600  4194302 hdc driver: ide-cdrom
1640  4194302 hdd driver: ide-cdrom
0900    128448 md0 (driver?)
0901  75874880 md1 (driver?)
Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(253,1)

Weird!

gargamel

Alien Bob 05-20-2008 03:05 AM

Even though the kernel has the drivers for device-mapping and crypto ciphers compiled-in, this is not all that is needed. The init script in the initrd uses the cryptsetup program to 'unlock' the encrypted partition by asking you for a passphrase. Without initrd, you will never get that partition unlocked and mounted.

Eric

gargamel 05-22-2008 07:48 AM

Thanks, again, Eric, this explains the last effect, at least. In the meantime I prepared my system to completely start over with all of this. I "zeroed" both harddiscs in the machine I am installing Slackware 12.1 on (actually I did # dd if=/dev/urandom of=/dev/hdx) and read some more documentation, and re-read some stuff I had read before.

To save me some work in case I need to do some corrections this time I want to use mkinitrd.conf instead of a single line mkinitrd command with all options. Therefore I read the man page of mkinitrd.conf, and found this:
Quote:

LUKSDEV
When using cryptsetup with an encrypted root partition, use this
variable to definie the *actual* device name of the encrypted root
partition and define the *mapped* device name as ROOTDEV.
For example, if your actual root device name in /etc/fstab is:
/dev/mapper/cryptroot on /dev/sda2

Then you'll need to set:
LUKSDEV="/dev/sda2"
ROOTDEV="cryptroot"
Now, looking in my /etc/fstab, I currently have:
Code:

/dev/cryptvg/root    /
/dev/cryptvg/home    /home
/dev/md0            /boot

But I also have mapped devices:
Code:

# ls /dev/mapper/
control  cryptvg-home  cryptvg-root  slackluks

Now I wonder: What's the correct way to specify the root device?
This was my original mkinitrd command:

Code:

# mkinitrd -c -k 2.6.24.5-smp -m ext3 -f ext3 -r /dev/cryptvg/root -C /dev/md1 -L -R -l de-latin1-nodeadkeys
Is it really -r /dev/cryptvg/root, or shouldn't it be one of -r /dev/mapper/cryptvg-root or -r cryptvg-root? The last variant is what man mkinitrd.conf seems to suggest.

Here is my current /etc/mkinitrd.conf:
Code:

SOURCE_TREE="/boot/initrd-tree"
CLEAR_TREE="0"
OUTPUT_IMAGE="/boot/initrd.gz"
KERNEL_VERSION="2.6.24.5-smp"
KEYMAP="de-latin1-nodeadkeys"
MODULE_LIST="ext3:mbcache:jbd:uhci-hcd:usbhid"
LUKSDEV="/dev/md1"
ROOTDEV="/dev/cryptvg/root"
ROOTDEV="ext3"
RAID="1"
LVM="1"
WAIT="1"

I would expect most of it to be ok, but I am not sure about the MODULE_LIST and ROOTDEV. Is the above ok, or should it rather be one of (1) or (2)?
Code:

MODULE_LIST="ext3:uhci-hcd:usbhid"
LUKSDEV="/dev/md1"
(1) ROOTDEV="/dev/mapper/cryptvg-root"
(2) ROOTDEV="cryptvg-root"

And finally, just for better understanding and completeness: Which of these variants is correct for a single line mkinitrd command? Exactly the same as for mkinitrd.conf, or are there any differences?

This time I am asking before trying all of this out, as my last experiments ended with a totally screwed system, and complete confusion about mountpoints, boot sectors and such. I'd rather not provoke these problems to re-appear, as cleaning up everything took some time...

Thanks a lot once again,

gargamel


All times are GMT -5. The time now is 11:09 AM.