LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Slackware (http://www.linuxquestions.org/questions/slackware-14/)
-   -   Raid issues with the last kernel upgrade from 3.2.29 to 3.2.45 on Slackware 14.0 (http://www.linuxquestions.org/questions/slackware-14/raid-issues-with-the-last-kernel-upgrade-from-3-2-29-to-3-2-45-on-slackware-14-0-a-4175468562/)

meetscott 07-05-2013 02:42 AM

Raid issues with the last kernel upgrade from 3.2.29 to 3.2.45 on Slackware 14.0
 
Been a long time since I've asked a question here.

The last kernel release for Slackware hosed my Raid installation. Basically, I had to roll back. 3.2.45 seems to be recognizing my disk as md127 and md126 when the kernel is loading.

A little info on my system...
4 disks

Raid 1 on the boot partition, /dev/md0 on a logical volume /dev/vg1/boot
md version 0.90

Raid 10 on the rest /dev/md1, /dev/vg2/root, /dev/vg2/swap, /dev/vg2/home
md version 1.2

I'm using the generic kernel. After the upgrade, mkinitrd, and lilo reinstall I get this message:

Code:

mount: mounting /dev/vg2/root on /mnt failed: No such device
ERROR: No /sbin/init found on rootdev (or not mounted). Trouble ahead. 
You an try to fix it. Type ‘exit’ when things are done.

At this point nothing brings it alive. I've tried booting both the huge and generic kernels. I have to boot from the Slackware install DVD, remove all kernel patch packages for 3.2.45 and install the 3.2.29 packages again. Rerun mkinitrd and reinstall lilo and I have a working system again.

Any thoughts or have others run into the same problems? I've search around quite a bit a tried quite a few things but it looks like this kernel upgrade is a "no go" for software raid devices being recognized and used in the same way.

wildwizard 07-05-2013 03:14 AM

Hmm I had a similar issue with one of my RAID partitions not showing up in -current and I had assumed it was related to a change in the mkinitrd changes that went in.

I did however resolve the problem by ensuring that all RAID partitions are listed in /etc/mdadm.conf before creating the initrd.

That may or may not help as I don't know if the 3.2 series has been getting the same RAID code updates as the 3.9 series.

TracyTiger 07-05-2013 04:58 AM

Just a point of information ....

I'm successfully running Slack64 14.0 with the 3.2.45 kernel with a fully encrypted (except /boot) RAID1/RAID10 setup very similar to yours. However I'm not using LVM.

I use UUIDs in /etc/fstab, and like wildwizard, /etc/mdadm.conf defines the arrays, again with (different) UUIDs. I've had RAID component identification problems in the past when I didn't use UUID so now I always build RAID systems using UUID for configuration information.

It boots up as expected without difficulty. The challenging part was getting the UUIDs correct. Every query-type command seems to produce different UUIDs. Through trial and error I figured out which ones to use.

You may want to look carefully at mkinitrd, lilo, fstab, & mdadm.conf before giving up on the 3.2.45 kernel.

EDIT: ... and get rid of "root=" in the lilo configuration image section.

meetscott 07-05-2013 11:21 AM

Quote:

Originally Posted by wildwizard (Post 4984558)
Hmm I had a similar issue with one of my RAID partitions not showing up in -current and I had assumed it was related to a change in the mkinitrd changes that went in.

I did however resolve the problem by ensuring that all RAID partitions are listed in /etc/mdadm.conf before creating the initrd.

That may or may not help as I don't know if the 3.2 series has been getting the same RAID code updates as the 3.9 series.

Yes, I have those listed in my mdadm.conf. I did check that. I forgot to say so.

Code:

ARRAY /dev/md0 UUID=994ea4ee:2e64f4d5:208cdb8d:9e23b04b
ARRAY /dev/md/1 UUID=d79b38ac:2b0c654d:a16d0a19:babaf044

I've tried a few settings in there but have gotten no where. The /device is showing /dev/md/1. I've tried that and /dev/md1, which is what it originally was.

meetscott 07-05-2013 12:03 PM

Quote:

Originally Posted by Tracy Tiger (Post 4984596)
Just a point of information ....

I'm successfully running Slack64 14.0 with the 3.2.45 kernel with a fully encrypted (except /boot) RAID1/RAID10 setup very similar to yours. However I'm not using LVM.

I use UUIDs in /etc/fstab, and like wildwizard, /etc/mdadm.conf defines the arrays, again with (different) UUIDs. I've had RAID component identification problems in the past when I didn't use UUID so now I always build RAID systems using UUID for configuration information.

It boots up as expected without difficulty. The challenging part was getting the UUIDs correct. Every query-type command seems to produce different UUIDs. Through trial and error I figured out which ones to use.

You may want to look carefully at mkinitrd, lilo, fstab, & mdadm.conf before giving up on the 3.2.45 kernel.

EDIT: ... and get rid of "root=" in the lilo configuration image section.

Interesting. You are not using LVM and you are encrypting. I encrypt my laptop drive and I use LVM on that. The upgrade went okay on that one. Weird of the UUIDs need to be tweaked around now. I see them use them and didn't think anything of it. They match, what more could the system be looking for?

I'm a little confused on the last thing. "Get rid of 'root=' in the lilo configuration image section"??? How on earth will it know which partition to use for root? I have 4 partitions that could be used... I'm assuming it doesn't know that swap is swap.

Here's my lilo configuration without the the commented out parts. Keep in mind that this is my configuration with 3.2.29. The configuration is the same for 3.2.45 with those particular values changed.
Code:

append=" vt.default_utf8=0"
boot = /dev/md0
raid-extra-boot = mbr-only

bitmap = /boot/slack.bmp
bmp-colors = 255,0,255,0,255,0
bmp-table = 60,6,1,16
bmp-timer = 65,27,0,255                                                                                             
prompt                                                                                                                                                                                                                                                                       
timeout = 100
change-rules
reset

vga = 773

image = /boot/vmlinuz-generic-3.2.29
  initrd = /boot/initrd.gz
  root = /dev/vg2/root
  label = 3.2.29
  read-only


TracyTiger 07-05-2013 01:31 PM

Quote:

Originally Posted by meetscott (Post 4984797)
Interesting. You are not using LVM and you are encrypting.

I've used RAID/Encryption both with and without LVM. Both worked. I don't have any current systems running LVM for me to check at the moment.

Quote:

I'm a little confused on the last thing. "Get rid of 'root=' in the lilo configuration image section"??? How on earth will it know which partition to use for root? I have 4 partitions that could be used... I'm assuming it doesn't know that swap is swap.
I believe "root=" isn't needed with initrd because initrd already has the information about which partition to use for root. See the thread here https://www.linuxquestions.org/quest...6/#post4801795 for information on how using "root=" in the lilo image section causes problems.

Quote:

Here's my lilo configuration without the the commented out parts. Keep in mind that this is my configuration with 3.2.29. The configuration is the same for 3.2.45 with those particular values changed.
My particular problems in the linked post occurred with I upgraded a running system. I don't know why an upgrade causes issues.

Troubleshooting based on my ignorance follows ...
You may want to force a failure by changing a UUID in mdadm.conf just to see that the information there is actually being utilized and that the UUIDs there are correct when the new kernel is running.

TracyTiger 07-05-2013 03:21 PM

Quote:

Originally Posted by meetscott (Post 4984772)
Yes, I have those listed in my mdadm.conf. I did check that. I forgot to say so.

Code:

ARRAY /dev/md0 UUID=994ea4ee:2e64f4d5:208cdb8d:9e23b04b
ARRAY /dev/md/1 UUID=d79b38ac:2b0c654d:a16d0a19:babaf044

I've tried a few settings in there but have gotten no where. The /device is showing /dev/md/1. I've tried that and /dev/md1, which is what it originally was.

Note that my mdadm.conf file looks more like this:

Code:

ARRAY /dev/md1 metadata=0.90 UUID=xxxxxxxx:xxxxxxxx:xxxxxxxx:xxxxxxxx
ARRAY /dev/md2 metadata=1.2 UUID=xxxxxxxx:xxxxxxxx:xxxxxxxx:xxxxxxxx
ARRAY /dev/md3 metadata=1.2 UUID=xxxxxxxx:xxxxxxxx:xxxxxxxx:xxxxxxxx
ARRAY /dev/md5 metadata=1.2 UUID=xxxxxxxx:xxxxxxxx:xxxxxxxx:xxxxxxxx
ARRAY /dev/md6 metadata=1.2 UUID=xxxxxxxx:xxxxxxxx:xxxxxxxx:xxxxxxxx

Maybe the missing metadata is important to the new kernel? Perhaps it defaults to version 1.2 so version 0.90 needs to be made explicit?

meetscott 07-07-2013 02:41 AM

Quote:

Originally Posted by Tracy Tiger (Post 4984889)
Note that my mdadm.conf file looks more like this:

Code:

ARRAY /dev/md1 metadata=0.90 UUID=xxxxxxxx:xxxxxxxx:xxxxxxxx:xxxxxxxx
ARRAY /dev/md2 metadata=1.2 UUID=xxxxxxxx:xxxxxxxx:xxxxxxxx:xxxxxxxx
ARRAY /dev/md3 metadata=1.2 UUID=xxxxxxxx:xxxxxxxx:xxxxxxxx:xxxxxxxx
ARRAY /dev/md5 metadata=1.2 UUID=xxxxxxxx:xxxxxxxx:xxxxxxxx:xxxxxxxx
ARRAY /dev/md6 metadata=1.2 UUID=xxxxxxxx:xxxxxxxx:xxxxxxxx:xxxxxxxx

Maybe the missing metadata is important to the new kernel? Perhaps it defaults to version 1.2 so version 0.90 needs to be made explicit?

Thanks for the reply, I've tried both, with and without the metadata.

Regarding your previous post...
I've never tried *not* specifying the root device in my lilo.conf. I've been running this way for years and never had a problem. It is also specified in the Slackware documentation Alien Bob wrote. That doesn't make it right and perhaps it is worth trying.

I don't know why this is suddenly becoming an issue. I don't reinstall from scratch unless I must for some new system. I always go through the upgrade process. These LVM Raid 10 configurations have been flawless through these upgrades. I even upgrade one machine remotely as it is colocated. This has also gone well for the last 7 years and I don't know how many upgrades :-)

Incidentally, I have a laptop, which is *not* raid but uses LVM and encryption. The upgrade was okay there. Given the variety of configurations of systems (5 at the moment) I have running Slackware, I'm left with the impression that this is only an issue with Raid and the new 3.2.45 kernel.

TracyTiger 07-07-2013 01:34 PM

Quote:

Originally Posted by meetscott (Post 4984546)
3.2.45 seems to be recognizing my disk as md127 and md126 when the kernel is loading.

Whenever I don't use default values and I see default values appearing on the screen and in logs, I usually suspect that my configuration setup isn't working (/etc/xxxx.conf) or isn't being referenced as I intended.

Quote:

I've never tried *not* specifying the root device in my lilo.conf. I've been running this way for years and never had a problem. It is also specified in the Slackware documentation Alien Bob wrote. That doesn't make it right and perhaps it is worth trying.
As you probably read in the link to the previous LQ thread, Alien Bob was who suggested I drop specifying root in the lilo.conf image section.

Quote:

I don't know why this is suddenly becoming an issue.
RAID using initrd and specifying root in lilo worked well for me for a long time also....until it didn't. :)

Perhaps other LQ members have better insight into your issue than I, and would like to respond.

kikinovak 07-07-2013 01:59 PM

Everything running fine here.

Code:

[root@nestor:~] # uname -r
3.2.45
[root@nestor:~] # cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] [multipath]
md3 : active raid5 sda3[0] sdd3[3] sdc3[2] sdb3[1]
      729317376 blocks level 5, 512k chunk, algorithm 2 [4/4] [UUUU]
     
md2 : active raid1 sda2[0] sdd2[3] sdc2[2] sdb2[1]
      995904 blocks [4/4] [UUUU]
     
md1 : active raid1 sda1[0] sdd1[3] sdc1[2] sdb1[1]
      96256 blocks [4/4] [UUUU]


meetscott 07-07-2013 02:51 PM

Quote:

Originally Posted by Tracy Tiger (Post 4985774)
Whenever I don't use default values and I see default values appearing on the screen and in logs, I usually suspect that my configuration setup isn't working (/etc/xxxx.conf) or isn't being referenced as I intended.



As you probably read in the link to the previous LQ thread, Alien Bob was who suggested I drop specifying root in the lilo.conf image section.



RAID using initrd and specifying root in lilo worked well for me for a long time also....until it didn't. :)

Perhaps other LQ members have better insight into your issue than I, and would like to respond.

I didn't read the link before, but I have now. I'll have to give it a try. It seems that might be the key.

Richard Cranium 07-07-2013 06:33 PM

I had no issues upgrading from 3.2.29 to 3.2.45.

My boot partition is on /dev/md0. I do use grub2 instead of lilo and all of my raid arrays auto-assemble instead of being explicitly defined in /etc/mdadm.conf.

Code:

root@darkstar:~# cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] [multipath]
md1 : active raid1 sde3[0] sdf3[1]
      142716800 blocks super 1.2 [2/2] [UU]
     
md0 : active raid1 sde2[0] sdf2[1]
      523968 blocks super 1.2 [2/2] [UU]
     
md3 : active raid1 sdc2[0] sda2[1]
      624880192 blocks [2/2] [UU]
     
unused devices: <none>
# pvs
  PV        VG      Fmt  Attr PSize  PFree 
  /dev/md1  mdgroup lvm2 a--  136.09g 136.09g
  /dev/md3  mdgroup lvm2 a--  595.91g  86.62g
  /dev/sdd  testvg  lvm2 a--  111.79g  11.79g
root@darkstar:~#


meetscott 07-14-2013 07:17 PM

Well, I figured I'd let everyone know I tried this... that is not specifying the root in the lilo.conf images section. I get the exact same results. So, I'm completely at a loss as to why I appear to be the only person who is seeing this behavior.

I give up on this one. I'm just going to be happy running the older kernel. It takes too much time to experiment around with this sort of thing and I have several other projects I need to attend to.

Richard Cranium 07-15-2013 10:19 PM

Quote:

Originally Posted by meetscott (Post 4984546)
Been a long time since I've asked a question here.

The last kernel release for Slackware hosed my Raid installation. Basically, I had to roll back. 3.2.45 seems to be recognizing my disk as md127 and md126 when the kernel is loading.

I bothered to look at my dmesg output; it appears that my system also starts using md125, md126 and md127 but figures out later that isn't correct (some messages removed for clarity)...

Code:

[    4.664805] udevd[1056]: starting version 182
[    4.876982] md: bind<sda2>
[    4.882388] md: bind<sdc2>
[    4.883399] bio: create slab <bio-1> at 1
[    4.883557] md/raid1:md127: active with 2 out of 2 mirrors
[    4.883674] md127: detected capacity change from 0 to 639877316608
[    4.890199]  md127: unknown partition table

[  23.896644]  sde: sde1 sde2 sde3
[  23.902268] sd 8:0:2:0: [sde] Attached SCSI disk
[  24.036540] md: bind<sde2>
[  24.039801] md: bind<sde3>
[  24.126990]  sdf: sdf1 sdf2 sdf3
[  24.132618] sd 8:0:3:0: [sdf] Attached SCSI disk

[  24.264754] md: bind<sdf3>

[  24.266127] md/raid1:md125: active with 2 out of 2 mirrors
[  24.266242] md125: detected capacity change from 0 to 146142003200

[  24.274335] md: bind<sdf2>
[  24.275479] md/raid1:md126: active with 2 out of 2 mirrors
[  24.275593] md126: detected capacity change from 0 to 536543232
[  24.288237]  md126: unknown partition table
[  24.293884]  md125: unknown partition table

[  25.575361] md125: detected capacity change from 146142003200 to 0
[  25.575466] md: md125 stopped.
[  25.575566] md: unbind<sdf3>
[  25.589043] md: export_rdev(sdf3)
[  25.589161] md: unbind<sde3>
[  25.605083] md: export_rdev(sde3)
[  25.605667] md126: detected capacity change from 536543232 to 0
[  25.605771] md: md126 stopped.
[  25.605871] md: unbind<sdf2>
[  25.610029] md: export_rdev(sdf2)
[  25.610136] md: unbind<sde2>
[  25.615016] md: export_rdev(sde2)
[  25.615537] md127: detected capacity change from 639877316608 to 0
[  25.615641] md: md127 stopped.
[  25.615741] md: unbind<sdc2>
[  25.620051] md: export_rdev(sdc2)
[  25.620156] md: unbind<sda2>
[  25.624083] md: export_rdev(sda2)
[  25.772347] md: md3 stopped.
[  25.772979] md: bind<sda2>
[  25.773190] md: bind<sdc2>
[  25.774071] md/raid1:md3: active with 2 out of 2 mirrors
[  25.774188] md3: detected capacity change from 0 to 639877316608
[  25.781286]  md3: unknown partition table
[  25.794571] md: md0 stopped.
[  25.795353] md: bind<sdf2>
[  25.795611] md: bind<sde2>
[  25.796365] md/raid1:md0: active with 2 out of 2 mirrors
[  25.796482] md0: detected capacity change from 0 to 536543232
[  25.808178]  md0: unknown partition table
[  26.014044] md: md1 stopped.
[  26.020403] md: bind<sdf3>
[  26.020649] md: bind<sde3>
[  26.021428] md/raid1:md1: active with 2 out of 2 mirrors
[  26.021544] md1: detected capacity change from 0 to 146142003200
[  26.071258]  md1: unknown partition table

I doubt any of that helps, but if you ever get around to looking at this again, you might want to wade through https://bugzilla.redhat.com/show_bug.cgi?id=606481 which contained more than I ever wanted to know about the subject. (Hell, now I'm not sure why my setup works! :confused: )

meetscott 07-16-2013 10:34 AM

Richard Cranium, thanks for taking the time to put that output together so nicely. I saw the same things, only mine doesn't figure it out later. I guess they've put this auto-detection into kernel now. I would imagine I'm going to have to address it some day. I have a few other projects I'm working on at the moment, so I don't really have time to burn on figuring this out for now.

Just be grateful it is working :-)


All times are GMT -5. The time now is 07:28 AM.