[SOLVED] New kernel, 5.1 / 5.1.2 boot failure strangeness
SlackwareThis Forum is for the discussion of Slackware Linux.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
I'm a bit baffled. Running and booting kernel 5.1 on my desktop is fine. However, I've failed to boot my old Thinkpad T400 to an ext4 filesystem. I've had no issue with 5.0.8 and am using essentially the same lilo and kernel config for booting with both 5.0.8 and 5.1 / 5.1.2 and the latter two fail.
With the boot failures for 5.1.2 from looking at the boot messages it is finding the drive, the partitions and the ext4 file system on /dev/sda1. It does not appear to run the init scripts. I get a login prompt for 'Darkstar' immediately but am unable to log in. I know Darkstar is an old Slackware default. I presume I can't log in as /etc/shadow or /etc/passwd might not be the the normal ones. But I can't check as I can't log into 'Darkstar' and who knows what file system it thinks it is using.
My hostname is 'laptop-maint' and /etc/HOSTNAME contains 'laptop-main.fire' not 'Darkstar' so I really wonder where this login prompt is arising from? Looking through /etc and /etc/rc.d the only place 'Darkstar' comes up is in /etc/rc.d/rc.M, but given that /etc/HOSTNAME exists and is readable that default should be ignored.
I wondered whether it was picking up an old initrd and stalling there, but none is being used. I renamed an old one just in case, no effect.
Given that /dev/sda1 is the only disk partition with a file system, /dev/sda2 being swap and /dev/sda3 being a luks volume and no other disks, I really can't see where it is trying to boot from? The only thing left is whether there has been some last ditch init / getty functionality included in newer kernels and parhaps 'Darkstar' complied into the kernel, rather than just a panic if it fails to find a root partition, but I am not aware of such functionality.
Any thoughts as to what might be happening?
Last edited by petejc; 05-15-2019 at 05:51 PM.
Reason: typo
THank you. Building 5.1.3 (which has just come out) with the default hostname changed, so I can tell if it comes from the kernel. Still does no explain why root mounts yet I don't get a normal boot. Still, I will find out if 5.1.3 fixes it for some inexplicable reason.
THank you. Building 5.1.3 (which has just come out) with the default hostname changed, so I can tell if it comes from the kernel. Still does no explain why root mounts yet I don't get a normal boot. Still, I will find out if 5.1.3 fixes it for some inexplicable reason.
OK, 5.1.3, with default hostname set to 'Pete_intel'. Kernel mounts the ext4 filesystem on /dev/sda1 but does not run the init scripts as it should and immedately gives me a login, but the hostname is now 'Pete_intel', so it is picking this up from the kernel, not /etc/HOSTNAME, which is set, nor the default in /etc/rc.d/rc.M.
OK, 5.1.3, with default hostname set to 'Pete_intel'. Kernel mounts the ext4 filesystem on /dev/sda1 but does not run the init scripts as it should and immedately gives me a login, but the hostname is now 'Pete_intel', so it is picking this up from the kernel, not /etc/HOSTNAME, which is set, nor the default in /etc/rc.d/rc.M.
Solved, sort of.
I've built kernel 5.1.4 on a different machine and that seems to boot OK via lilo. Not sure what the difference is apart from a minor bump in version number and that I used a rather out of date copy of slackware-current to build the kernel.
I've built kernel 5.1.4 on a different machine and that seems to boot OK via lilo.
I think I stumbled upon a seemingly similar problem. My old Dell Precision M4300 laptop running Slackware64 14.2 fails to boot with 5.1.x, although 5.0.6 was working fine. I tried 5.1.4 and 5.1.7, the first built on a relatively modern computer and the second on the Dell itself. In both cases I get the darkstar prompt. However, before it loads the modules from initrd and right after the initialization of eudev I see a "Bus error" line in the output. It appears again after the initrd module-loading messages. Do you remember if you had that error, too?
Btw, in both cases the builds were made on Slackware64 14.2.
I have a similar-sounding problem. 5.1 installed fine on my laptop. But the identical kernel gave boot weirdness on the desktop. The system came up to the login prompt very quickly, without changing from the initial terminal font to a smaller one as it usually did. There was a message from INIT that "SV" was respawning too quickly. When I supplied user name and password, I just got the login prompt again.
5.0.9 works great.
I tried building with the .config from current (kernel-source-4.19.49-noarch-1.txz, file usr/src/linux-4.19.49/arch/x86/configs/x86_64_defconfig). PCI support was turned off initially, so rebuilt with it on (and SELINUX turned off). Worked slightly better, could log in but fonts were weird and X looked horrible.
Just noticed top-level usr/src/linux-4.19.49/.config - will try that.
Desktop mainboard is approx 10 YO while laptop is < 2YO. Desktop has nvmE disks but they don't seem to be the problem.
Hope it's just some type of config issue but ... what?
Last edited by duncan_roe; 06-12-2019 at 06:26 AM.
There was a message from INIT that "SV" was respawning too quickly. When I supplied user name and password, I just got the login prompt again.
In my case it said "x1" was spawning too fast. However, I believe this is not a manifestation of the main issue but only a side-effect of it. Did you see any "bus error" messages like I did? Given this message and the fact that I was unable to find similar bug reports in Google, I suspect that the culprit is some rare combination of software, such as eudev 3.1.5 (from 2015) and kernel 5.1. I would like to try updating eudev but I don't know how complicated it can get, so I am waiting for now.
I have yet to notice a bus error. Where I am at now is: I do have a .config that will boot, but the resultant system is not very usable. This .config is usr/src/linux-4.19.49/arch/x86/configs/x86_64_defconfig, migrated to 5.1.8 by successive iterations of make xconfig plus some manual diffs to get device 259 recognised (nvmE disks) (attached)
The system mis-reports /dev/sda as having 1 partition when it actually has 4 (or it might have only seen what is normally /dev/sdb - one of them is SATA and the other is IDE but they both have multiple partitions).
There is no network: ifconfig -a shows sit0 instead of eth0.
In case it's any help, I've attached dmesg o/p in that system.
There's more, but I have to go now
Thanks duncan_roe. I would still suspect eudev, since it is eudev's responsibility to create the device nodes correctly. Perhaps if I have the time I will try upgrading eudev and recompiling other stuff if necessary. This is kind of like Linux from Scratch which I have not much experience with, though, and I may give up early on. Any advice on how to do an eudev upgrade is welcome.
By the way, is there a particular reason why you did not use 'make oldconfig' to migrate your old config? It should have made the switch in one go.
I was under the impression that make xconfig did an implied make oldconfig first. It has worked for me for 25 years anyway. At least it has worked for x.y -> x.y+1, not always for bigger jumps.
I was going to document my further experiences but think I have a better plan now: I'm going to git bisect between v5.0 and v5.1 until I find the culprit patch. When I do find it, raise a bug report.
I made an antidote to that patch, attached as revert_459e3a21.txt. When I applied it to Linux 5.1.12, the kernel booted normally. 5.1.12 had just appeared, I guess it should work for any 5.1.
Please try this patch. In your top-level Linux source directory, do cat revert_459e3a21.txt | patch -p1 Then [re-] build the kernel.
2 QUESTIONS:
1. Are we all only seeing a problem on old hardware?
2. AMD, Intel or what? (both my old and new systems are AMD)
Any answers will be helpful for the bug report I now need to put together
Aha, that was quick. Great work on your part duncan_roe. Unfortunately I don't have that computer with me right now but I will compile a kernel with the patch reverted and try it this weekend. I will report the results here (or on kernel.org, if you open the bug report by then. I already have an account there).
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.