LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Slackware (https://www.linuxquestions.org/questions/slackware-14/)
-   -   New kernel, 5.1 / 5.1.2 boot failure strangeness (https://www.linuxquestions.org/questions/slackware-14/new-kernel-5-1-5-1-2-boot-failure-strangeness-4175653980/)

petejc 05-15-2019 05:49 PM

New kernel, 5.1 / 5.1.2 boot failure strangeness
 
I'm a bit baffled. Running and booting kernel 5.1 on my desktop is fine. However, I've failed to boot my old Thinkpad T400 to an ext4 filesystem. I've had no issue with 5.0.8 and am using essentially the same lilo and kernel config for booting with both 5.0.8 and 5.1 / 5.1.2 and the latter two fail.

With the boot failures for 5.1.2 from looking at the boot messages it is finding the drive, the partitions and the ext4 file system on /dev/sda1. It does not appear to run the init scripts. I get a login prompt for 'Darkstar' immediately but am unable to log in. I know Darkstar is an old Slackware default. I presume I can't log in as /etc/shadow or /etc/passwd might not be the the normal ones. But I can't check as I can't log into 'Darkstar' and who knows what file system it thinks it is using.

My hostname is 'laptop-maint' and /etc/HOSTNAME contains 'laptop-main.fire' not 'Darkstar' so I really wonder where this login prompt is arising from? Looking through /etc and /etc/rc.d the only place 'Darkstar' comes up is in /etc/rc.d/rc.M, but given that /etc/HOSTNAME exists and is readable that default should be ignored.

I wondered whether it was picking up an old initrd and stalling there, but none is being used. I renamed an old one just in case, no effect.

Given that /dev/sda1 is the only disk partition with a file system, /dev/sda2 being swap and /dev/sda3 being a luks volume and no other disks, I really can't see where it is trying to boot from? The only thing left is whether there has been some last ditch init / getty functionality included in newer kernels and parhaps 'Darkstar' complied into the kernel, rather than just a panic if it fails to find a root partition, but I am not aware of such functionality.

Any thoughts as to what might be happening?

Petri Kaukasoina 05-16-2019 01:47 AM

Yes, it's in the kernel config:
Code:

CONFIG_DEFAULT_HOSTNAME="darkstar"

petejc 05-16-2019 03:01 PM

Quote:

Originally Posted by Petri Kaukasoina (Post 5995563)
Yes, it's in the kernel config:
Code:

CONFIG_DEFAULT_HOSTNAME="darkstar"

THank you. Building 5.1.3 (which has just come out) with the default hostname changed, so I can tell if it comes from the kernel. Still does no explain why root mounts yet I don't get a normal boot. Still, I will find out if 5.1.3 fixes it for some inexplicable reason.

petejc 05-16-2019 04:48 PM

Quote:

Originally Posted by petejc (Post 5995805)
THank you. Building 5.1.3 (which has just come out) with the default hostname changed, so I can tell if it comes from the kernel. Still does no explain why root mounts yet I don't get a normal boot. Still, I will find out if 5.1.3 fixes it for some inexplicable reason.

OK, 5.1.3, with default hostname set to 'Pete_intel'. Kernel mounts the ext4 filesystem on /dev/sda1 but does not run the init scripts as it should and immedately gives me a login, but the hostname is now 'Pete_intel', so it is picking this up from the kernel, not /etc/HOSTNAME, which is set, nor the default in /etc/rc.d/rc.M.

petejc 05-24-2019 12:58 PM

Quote:

Originally Posted by petejc (Post 5995850)
OK, 5.1.3, with default hostname set to 'Pete_intel'. Kernel mounts the ext4 filesystem on /dev/sda1 but does not run the init scripts as it should and immedately gives me a login, but the hostname is now 'Pete_intel', so it is picking this up from the kernel, not /etc/HOSTNAME, which is set, nor the default in /etc/rc.d/rc.M.

Solved, sort of.

I've built kernel 5.1.4 on a different machine and that seems to boot OK via lilo. Not sure what the difference is apart from a minor bump in version number and that I used a rather out of date copy of slackware-current to build the kernel.

Ilgar 06-04-2019 09:26 AM

Quote:

Originally Posted by petejc (Post 5998544)
Solved, sort of.

I've built kernel 5.1.4 on a different machine and that seems to boot OK via lilo.

I think I stumbled upon a seemingly similar problem. My old Dell Precision M4300 laptop running Slackware64 14.2 fails to boot with 5.1.x, although 5.0.6 was working fine. I tried 5.1.4 and 5.1.7, the first built on a relatively modern computer and the second on the Dell itself. In both cases I get the darkstar prompt. However, before it loads the modules from initrd and right after the initialization of eudev I see a "Bus error" line in the output. It appears again after the initrd module-loading messages. Do you remember if you had that error, too?

Btw, in both cases the builds were made on Slackware64 14.2.

duncan_roe 06-12-2019 06:24 AM

Me too
 
I have a similar-sounding problem. 5.1 installed fine on my laptop. But the identical kernel gave boot weirdness on the desktop. The system came up to the login prompt very quickly, without changing from the initial terminal font to a smaller one as it usually did. There was a message from INIT that "SV" was respawning too quickly. When I supplied user name and password, I just got the login prompt again.
5.0.9 works great.
I tried building with the .config from current (kernel-source-4.19.49-noarch-1.txz, file usr/src/linux-4.19.49/arch/x86/configs/x86_64_defconfig). PCI support was turned off initially, so rebuilt with it on (and SELINUX turned off). Worked slightly better, could log in but fonts were weird and X looked horrible.
Just noticed top-level usr/src/linux-4.19.49/.config - will try that.
Desktop mainboard is approx 10 YO while laptop is < 2YO. Desktop has nvmE disks but they don't seem to be the problem.
Hope it's just some type of config issue but ... what?

Petri Kaukasoina 06-12-2019 07:09 AM

Quote:

Originally Posted by duncan_roe (Post 6004404)
I tried building with the .config from current (kernel-source-4.19.49-noarch-1.txz, file usr/src/linux-4.19.49/arch/x86/configs/x86_64_defconfig).

That is not what current uses. From kernel-source-4.19.49-noarch-1.txz, try file usr/src/linux-4.19.49/.config

Ilgar 06-13-2019 04:40 AM

Quote:

Originally Posted by duncan_roe (Post 6004404)
There was a message from INIT that "SV" was respawning too quickly. When I supplied user name and password, I just got the login prompt again.

In my case it said "x1" was spawning too fast. However, I believe this is not a manifestation of the main issue but only a side-effect of it. Did you see any "bus error" messages like I did? Given this message and the fact that I was unable to find similar bug reports in Google, I suspect that the culprit is some rare combination of software, such as eudev 3.1.5 (from 2015) and kernel 5.1. I would like to try updating eudev but I don't know how complicated it can get, so I am waiting for now.

duncan_roe 06-16-2019 06:49 PM

3 Attachment(s)
I have yet to notice a bus error. Where I am at now is: I do have a .config that will boot, but the resultant system is not very usable. This .config is usr/src/linux-4.19.49/arch/x86/configs/x86_64_defconfig, migrated to 5.1.8 by successive iterations of make xconfig plus some manual diffs to get device 259 recognised (nvmE disks) (attached)
The system mis-reports /dev/sda as having 1 partition when it actually has 4 (or it might have only seen what is normally /dev/sdb - one of them is SATA and the other is IDE but they both have multiple partitions).
There is no network: ifconfig -a shows sit0 instead of eth0.
In case it's any help, I've attached dmesg o/p in that system.
There's more, but I have to go now

Ilgar 06-17-2019 03:32 AM

Thanks duncan_roe. I would still suspect eudev, since it is eudev's responsibility to create the device nodes correctly. Perhaps if I have the time I will try upgrading eudev and recompiling other stuff if necessary. This is kind of like Linux from Scratch which I have not much experience with, though, and I may give up early on. Any advice on how to do an eudev upgrade is welcome.

By the way, is there a particular reason why you did not use 'make oldconfig' to migrate your old config? It should have made the switch in one go.

duncan_roe 06-17-2019 09:06 PM

I was under the impression that make xconfig did an implied make oldconfig first. It has worked for me for 25 years anyway. At least it has worked for x.y -> x.y+1, not always for bigger jumps.
I was going to document my further experiences but think I have a better plan now: I'm going to git bisect between v5.0 and v5.1 until I find the culprit patch. When I do find it, raise a bug report.

duncan_roe 06-19-2019 02:01 AM

git bisect is progressing - looks like problem appeared around 5.1.rc7

duncan_roe 06-19-2019 08:13 AM

Please try this patch
 
1 Attachment(s)
git bisect identified commit 459e3a21535ae3c7a9a123650e54f5c882b8fcbf as the culprit. This is the log entry:
Quote:

Author: Linus Torvalds <torvalds@linux-foundation.org>
Date: Wed May 1 11:20:53 2019 -0700

gcc-9: properly declare the {pv,hv}clock_page storage

The pvlock_page and hvclock_page variables are (as the name implies)
addresses to pages, created by the linker script.

But we declared them as just "extern u8" variables, which _works_, but
now that gcc does some more bounds checking, it causes warnings like

warning: array subscript 1 is outside array bounds of "u8[1]"

when we then access more than one byte from those variables.

Fix this by simply making the declaration of the variables match
reality, which makes the compiler happy too.

Signed-off-by: Linus Torvalds <torvalds@-linux-foundation.org>
I made an antidote to that patch, attached as revert_459e3a21.txt. When I applied it to Linux 5.1.12, the kernel booted normally. 5.1.12 had just appeared, I guess it should work for any 5.1.
Please try this patch. In your top-level Linux source directory, do cat revert_459e3a21.txt | patch -p1 Then [re-] build the kernel.

2 QUESTIONS:

1. Are we all only seeing a problem on old hardware?

2. AMD, Intel or what? (both my old and new systems are AMD)

Any answers will be helpful for the bug report I now need to put together

Ilgar 06-19-2019 03:45 PM

Aha, that was quick. Great work on your part duncan_roe. Unfortunately I don't have that computer with me right now but I will compile a kernel with the patch reverted and try it this weekend. I will report the results here (or on kernel.org, if you open the bug report by then. I already have an account there).

petejc 06-19-2019 04:39 PM

Quote:

Originally Posted by Ilgar (Post 6002042)
I think I stumbled upon a seemingly similar problem. My old Dell Precision M4300 laptop running Slackware64 14.2 fails to boot with 5.1.x, although 5.0.6 was working fine. I tried 5.1.4 and 5.1.7, the first built on a relatively modern computer and the second on the Dell itself. In both cases I get the darkstar prompt. However, before it loads the modules from initrd and right after the initialization of eudev I see a "Bus error" line in the output. It appears again after the initrd module-loading messages. Do you remember if you had that error, too?

Btw, in both cases the builds were made on Slackware64 14.2.

Sorry, did not notice your post untill today. Sorry, did not notice.

duncan_roe 06-19-2019 07:24 PM

Bug 203935 submitted to bugzilla.kernel.org

duncan_roe 06-21-2019 08:05 PM

Bug 203935 is resolved
 
1 Attachment(s)
The problem was only seen on old AMD K8 systems (like my 10YO Athlon) and building with old GCC (e.g. 5.5.0).
I closed the bug after testing the attached fix.
Please try it yourselves.

Ilgar 06-22-2019 07:42 AM

I will test it tomorrow, but that Dell laptop of mine had an Intel CPU, not AMD.

Ilgar 06-23-2019 11:17 AM

Good news everyone! Both duncan_roe's original revert and the new upstream patch worked perfetly well.

@duncan_roe: Thank you so much for all the work you have done on this. I was not expecting this to be solved so quickly!

sebastians 06-24-2019 07:32 AM

Gotta be honest, I'm new with Slackware and I'm experiment with my first Kernel compilation ever.

I had the same exact problem on a ThinkPad T400 and version 5.1.14 of the kernel.

It suddenly stop at login with a "too fast" kind of error and Darkstar as hostname. I could not even login with any user account. I thought it was due to my inexperience, but then I compiled version 4.19.55 in the same identical manner... and it works.

I tried to follow this thread, but still I am not that fluent with kernel related stuff to have a clear idea on how to fix this and can't find nothing on the docs. Actually, I don't really need the latest kernel version "just because", is that I got on Slackware, because I want to learn it (you know, the old saying about slack...).

If some good soul is willing to help, it would be terrific on my journey to dissect and understand the whole GNU/Linux system.

Ilgar 06-24-2019 03:02 PM

Hi there sebastians,

The patch hasn't made it to the release version yet (maybe it will be included in 5.1.15). Assuming that you already know how to compile your own kernel, all you need to do is the following:

- Save the attached patch file in duncan's latest post (post #18) somewhere on your hard drive.

- Then go to /usr/src/linux-5.1.14, or whatever is the directory into which the kernel source is unpacked.

- Do
Code:

patch -p1 < [location of the patch file]
As output, it will list the files that are patched. In this case there is only one.

- Then compile your kernel as usual.

The '-p1' is about the relative depth to be used for directory names when reading the patch files' contents (just take a look at the file and it will be clear). The '<' directs the contents of the patch file into standard input (stdin), which the patch command reads.

sebastians 06-24-2019 08:11 PM

Awesome! Thanks @Ilgar.

duncan_roe 06-25-2019 01:45 AM

The patch is in 5.2-rc6

Ilgar 06-25-2019 04:55 AM

Quote:

Originally Posted by duncan_roe (Post 6008748)
The patch is in 5.2-rc6

It made it to 5.1.15 as well.


All times are GMT -5. The time now is 07:59 AM.