LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Slackware (https://www.linuxquestions.org/questions/slackware-14/)
-   -   New kernel, 5.1 / 5.1.2 boot failure strangeness (https://www.linuxquestions.org/questions/slackware-14/new-kernel-5-1-5-1-2-boot-failure-strangeness-4175653980/)

petejc 05-15-2019 05:49 PM

New kernel, 5.1 / 5.1.2 boot failure strangeness
 
I'm a bit baffled. Running and booting kernel 5.1 on my desktop is fine. However, I've failed to boot my old Thinkpad T400 to an ext4 filesystem. I've had no issue with 5.0.8 and am using essentially the same lilo and kernel config for booting with both 5.0.8 and 5.1 / 5.1.2 and the latter two fail.

With the boot failures for 5.1.2 from looking at the boot messages it is finding the drive, the partitions and the ext4 file system on /dev/sda1. It does not appear to run the init scripts. I get a login prompt for 'Darkstar' immediately but am unable to log in. I know Darkstar is an old Slackware default. I presume I can't log in as /etc/shadow or /etc/passwd might not be the the normal ones. But I can't check as I can't log into 'Darkstar' and who knows what file system it thinks it is using.

My hostname is 'laptop-maint' and /etc/HOSTNAME contains 'laptop-main.fire' not 'Darkstar' so I really wonder where this login prompt is arising from? Looking through /etc and /etc/rc.d the only place 'Darkstar' comes up is in /etc/rc.d/rc.M, but given that /etc/HOSTNAME exists and is readable that default should be ignored.

I wondered whether it was picking up an old initrd and stalling there, but none is being used. I renamed an old one just in case, no effect.

Given that /dev/sda1 is the only disk partition with a file system, /dev/sda2 being swap and /dev/sda3 being a luks volume and no other disks, I really can't see where it is trying to boot from? The only thing left is whether there has been some last ditch init / getty functionality included in newer kernels and parhaps 'Darkstar' complied into the kernel, rather than just a panic if it fails to find a root partition, but I am not aware of such functionality.

Any thoughts as to what might be happening?

Petri Kaukasoina 05-16-2019 01:47 AM

Yes, it's in the kernel config:
Code:

CONFIG_DEFAULT_HOSTNAME="darkstar"

petejc 05-16-2019 03:01 PM

Quote:

Originally Posted by Petri Kaukasoina (Post 5995563)
Yes, it's in the kernel config:
Code:

CONFIG_DEFAULT_HOSTNAME="darkstar"

THank you. Building 5.1.3 (which has just come out) with the default hostname changed, so I can tell if it comes from the kernel. Still does no explain why root mounts yet I don't get a normal boot. Still, I will find out if 5.1.3 fixes it for some inexplicable reason.

petejc 05-16-2019 04:48 PM

Quote:

Originally Posted by petejc (Post 5995805)
THank you. Building 5.1.3 (which has just come out) with the default hostname changed, so I can tell if it comes from the kernel. Still does no explain why root mounts yet I don't get a normal boot. Still, I will find out if 5.1.3 fixes it for some inexplicable reason.

OK, 5.1.3, with default hostname set to 'Pete_intel'. Kernel mounts the ext4 filesystem on /dev/sda1 but does not run the init scripts as it should and immedately gives me a login, but the hostname is now 'Pete_intel', so it is picking this up from the kernel, not /etc/HOSTNAME, which is set, nor the default in /etc/rc.d/rc.M.

petejc 05-24-2019 12:58 PM

Quote:

Originally Posted by petejc (Post 5995850)
OK, 5.1.3, with default hostname set to 'Pete_intel'. Kernel mounts the ext4 filesystem on /dev/sda1 but does not run the init scripts as it should and immedately gives me a login, but the hostname is now 'Pete_intel', so it is picking this up from the kernel, not /etc/HOSTNAME, which is set, nor the default in /etc/rc.d/rc.M.

Solved, sort of.

I've built kernel 5.1.4 on a different machine and that seems to boot OK via lilo. Not sure what the difference is apart from a minor bump in version number and that I used a rather out of date copy of slackware-current to build the kernel.

Ilgar 06-04-2019 09:26 AM

Quote:

Originally Posted by petejc (Post 5998544)
Solved, sort of.

I've built kernel 5.1.4 on a different machine and that seems to boot OK via lilo.

I think I stumbled upon a seemingly similar problem. My old Dell Precision M4300 laptop running Slackware64 14.2 fails to boot with 5.1.x, although 5.0.6 was working fine. I tried 5.1.4 and 5.1.7, the first built on a relatively modern computer and the second on the Dell itself. In both cases I get the darkstar prompt. However, before it loads the modules from initrd and right after the initialization of eudev I see a "Bus error" line in the output. It appears again after the initrd module-loading messages. Do you remember if you had that error, too?

Btw, in both cases the builds were made on Slackware64 14.2.

duncan_roe 06-12-2019 06:24 AM

Me too
 
I have a similar-sounding problem. 5.1 installed fine on my laptop. But the identical kernel gave boot weirdness on the desktop. The system came up to the login prompt very quickly, without changing from the initial terminal font to a smaller one as it usually did. There was a message from INIT that "SV" was respawning too quickly. When I supplied user name and password, I just got the login prompt again.
5.0.9 works great.
I tried building with the .config from current (kernel-source-4.19.49-noarch-1.txz, file usr/src/linux-4.19.49/arch/x86/configs/x86_64_defconfig). PCI support was turned off initially, so rebuilt with it on (and SELINUX turned off). Worked slightly better, could log in but fonts were weird and X looked horrible.
Just noticed top-level usr/src/linux-4.19.49/.config - will try that.
Desktop mainboard is approx 10 YO while laptop is < 2YO. Desktop has nvmE disks but they don't seem to be the problem.
Hope it's just some type of config issue but ... what?

Petri Kaukasoina 06-12-2019 07:09 AM

Quote:

Originally Posted by duncan_roe (Post 6004404)
I tried building with the .config from current (kernel-source-4.19.49-noarch-1.txz, file usr/src/linux-4.19.49/arch/x86/configs/x86_64_defconfig).

That is not what current uses. From kernel-source-4.19.49-noarch-1.txz, try file usr/src/linux-4.19.49/.config

Ilgar 06-13-2019 04:40 AM

Quote:

Originally Posted by duncan_roe (Post 6004404)
There was a message from INIT that "SV" was respawning too quickly. When I supplied user name and password, I just got the login prompt again.

In my case it said "x1" was spawning too fast. However, I believe this is not a manifestation of the main issue but only a side-effect of it. Did you see any "bus error" messages like I did? Given this message and the fact that I was unable to find similar bug reports in Google, I suspect that the culprit is some rare combination of software, such as eudev 3.1.5 (from 2015) and kernel 5.1. I would like to try updating eudev but I don't know how complicated it can get, so I am waiting for now.

duncan_roe 06-16-2019 06:49 PM

3 Attachment(s)
I have yet to notice a bus error. Where I am at now is: I do have a .config that will boot, but the resultant system is not very usable. This .config is usr/src/linux-4.19.49/arch/x86/configs/x86_64_defconfig, migrated to 5.1.8 by successive iterations of make xconfig plus some manual diffs to get device 259 recognised (nvmE disks) (attached)
The system mis-reports /dev/sda as having 1 partition when it actually has 4 (or it might have only seen what is normally /dev/sdb - one of them is SATA and the other is IDE but they both have multiple partitions).
There is no network: ifconfig -a shows sit0 instead of eth0.
In case it's any help, I've attached dmesg o/p in that system.
There's more, but I have to go now

Ilgar 06-17-2019 03:32 AM

Thanks duncan_roe. I would still suspect eudev, since it is eudev's responsibility to create the device nodes correctly. Perhaps if I have the time I will try upgrading eudev and recompiling other stuff if necessary. This is kind of like Linux from Scratch which I have not much experience with, though, and I may give up early on. Any advice on how to do an eudev upgrade is welcome.

By the way, is there a particular reason why you did not use 'make oldconfig' to migrate your old config? It should have made the switch in one go.

duncan_roe 06-17-2019 09:06 PM

I was under the impression that make xconfig did an implied make oldconfig first. It has worked for me for 25 years anyway. At least it has worked for x.y -> x.y+1, not always for bigger jumps.
I was going to document my further experiences but think I have a better plan now: I'm going to git bisect between v5.0 and v5.1 until I find the culprit patch. When I do find it, raise a bug report.

duncan_roe 06-19-2019 02:01 AM

git bisect is progressing - looks like problem appeared around 5.1.rc7

duncan_roe 06-19-2019 08:13 AM

Please try this patch
 
1 Attachment(s)
git bisect identified commit 459e3a21535ae3c7a9a123650e54f5c882b8fcbf as the culprit. This is the log entry:
Quote:

Author: Linus Torvalds <torvalds@linux-foundation.org>
Date: Wed May 1 11:20:53 2019 -0700

gcc-9: properly declare the {pv,hv}clock_page storage

The pvlock_page and hvclock_page variables are (as the name implies)
addresses to pages, created by the linker script.

But we declared them as just "extern u8" variables, which _works_, but
now that gcc does some more bounds checking, it causes warnings like

warning: array subscript 1 is outside array bounds of "u8[1]"

when we then access more than one byte from those variables.

Fix this by simply making the declaration of the variables match
reality, which makes the compiler happy too.

Signed-off-by: Linus Torvalds <torvalds@-linux-foundation.org>
I made an antidote to that patch, attached as revert_459e3a21.txt. When I applied it to Linux 5.1.12, the kernel booted normally. 5.1.12 had just appeared, I guess it should work for any 5.1.
Please try this patch. In your top-level Linux source directory, do cat revert_459e3a21.txt | patch -p1 Then [re-] build the kernel.

2 QUESTIONS:

1. Are we all only seeing a problem on old hardware?

2. AMD, Intel or what? (both my old and new systems are AMD)

Any answers will be helpful for the bug report I now need to put together

Ilgar 06-19-2019 03:45 PM

Aha, that was quick. Great work on your part duncan_roe. Unfortunately I don't have that computer with me right now but I will compile a kernel with the patch reverted and try it this weekend. I will report the results here (or on kernel.org, if you open the bug report by then. I already have an account there).


All times are GMT -5. The time now is 09:00 PM.