Login hang when using custom kernel
Am trying to install Slackware 15.0, with the 5.15.19 kernel.
The HUGE kernel can be booted, and the user can login. Several custom kernels, with full features, but without some hardware drivers and without some file systems, have been built. They have Login stalls. Every Login attempt hangs. Some time later some timeout occurs and the Login will finish. There are no apparent other consequences directly related to the stall. I have already gone over the CONFIG listings from HUGE and the custom kernels, and have not found anything that looks like it could be a cause. I have even tried enabling a few things in the kernel in an effort to diagnose this that way. The expected time to completion of diagnosing this with such an technique is daunting, and may not be finite. I should try to see if I can compile a HUGE kernel myself and if that is any different. --- DBUS --- From a DBUS diagnostic: at user login, a DBUS message is issued, that timeouts. I do not know of a way to translate DBUS back to something useful, like an indication of which program to blame. What I need is to know what kernel feature is needed, and it would be interesting to know if this is DBUS, or some other low level daemon that is needing it. Is there anybody who can read this DBUS trace, or just already knows this. I have not figured out how to use the DBUS tools to test DBUS on the new system, as there are no good examples to work from. My attempts either error out immediately due to my syntax errors, or just sit there and run, and do not do anything. Not knowing any good source and destination names for testing has been hampering efforts. Thank you for your attention so far. After this point there is just a listing. In this listing: - This is the output of dbus-monitor, while a Login attempt was made at another console. - Certain private information has been changed. - Some lines marked >>>> have been added to denote important places. - I do understand that a message to a service named "org.freedesktop.login1" has failed, but I do not recognize that name, nor the naming format. Probably it is a name that was registered in DBUS by some daemon. But which one, and what is its problem. - I have not found anything that seems relevant in dmesg or any system logs. - Whatever is failing is apparently failing silently. - Whatever is failing is apparently not necessary, as the Login eventually will finish. Code:
bash-5.1# dbus-monitor --system |
Quote:
Probably your "custom" kernel has disabled some features required by elogind. How about to start with some stock .config of generic or huge kernels from Slackware? |
Thank you.
No sign of elogin Has agetty. Runs "login" when logging in. The "login" process is stalled, and DBUS reports that it does not answer. The problem could be with "login". After a long delay, the login does finish. There is a DBUS test-tool, but I am not even close to figuring out how to test DBUS with it. It seems to be for testing other program's DBUS interface. I have already tried 4 times to fix this by expanding the options included in the kernel config. I have run out of options that seem to remotely relevant. I have recompiled the large kernel using its original config, just to confirm it is not the compiler. The recompiled large kernel works and I can login. Next is to try compiling a custom kernel for some generic CPU, just to ensure that there is not a compiler bug for my specific CPU. This kind of testing is going to go on forever, there are just too many kernel options to test. What would be helpful is to know what is necessary in the kernel options to have DBUS and login working. Does anyone have an example small minimal config where login and DBUS works. At least I could compare it what I am trying. |
Gentoo kernel config has custom patch which allows choosing OpenRC or systemd. It does nothing by itself, just manages a set of options for user. Maybe you should get Gentoo sources and see what options it turns on when you choose OpenRC.
I can post my Gentoo .config if you wish. |
BTW, I think you are barking up the wrong tree. DBUS is not required to run a Linux system. However, if you use dynamic devices with udev or similar then you probably must have tmpfs mounted during boot for /dev.
|
Quote:
This is the kernel that is recommended for everyday use, and it's already pretty slimmed down. You can download the .config file here: 64-bit: https://mirrors.slackware.com/slackw...ic-5.15.19.x64 32-bit: https://mirrors.slackware.com/slackw...mp-5.15.19-smp Quote:
If you have installed 15.0 then you have elogind. |
re-enabling options in the kernel config will not work in some circumstances: e.g. if there is a dependency. In such case enabling first option will not enable second.
Because default kernel works, you will have to disregard your current custom kernel and start fresh. Your problems with dbus and elgind may suggest that somehow you messed up IPC (SysvIPC)? Disabling hardware options does nothing to dbus (my config is pure Intel, no AMD), I have only ext4 enabled and fat32. None of this has effect on login. As someone mentioned, if you have quite current config from other distros, you can re-use it (accept changes). I had single config working in Gentoo, Slackware, Artix and Venom. |
I have spotted the process elogind in the listing of "ps -A".
Previous listings were much longer, and it may have been off the page. I was trying to use htop, and was sorting to get latest processes at the top. I do not know about the relevance. It may be the target of the DBUS message to login1, I cannot tell. Is there a way to ask DBUS who owns a message address name? DBUS has a host of support utilities, but they are like figuring out a swiss-army-knife in the dark with gloves on. You know it probably can do wonderful things if you can get it open, but you also know it can cut you if you do it wrong. I am having enough of an learning experience already without adding triage on a wounded DBUS. Anyway, the latest compile of the custom kernel made login work again. I do not know why because I enabled about 20 more kernel options. Some were things like switching from SLOB to SLUB, which I don't know why that got selected. When I got to the library and crypto section, I made everything that was not selected into modules. Supposedly .. and I know supposing anything will lead to problems like this, but the kernel instructions lead me to believe this ... that those library and crypto options are ONLY for kernel modules, and external programs cannot access them. When the kernel needs one, the make procedure will enable it as a requirement. The only reason for having these user options is if there is an external module that might need one or more of the support functions. At least that is what the kernel instructions lead me to believe. 1. Normally the login sequence is so fast I cannot even see what processes were running. I cannot even switch consoles fast enough. But when it was hanging, there was a login process hanging too. 2. There are several candidates for who was actually the cause of the hanging. login - because of the above elogind - because it may the target of the login1 message gnome-keyring - because it gets started for every user at around the same time 3. Any one of those could be using a crypto function. How they are accessing a kernel crypto function is a mystery. There was no hints that they were exposed as system calls. 4. One of those could be invoking some kernel operation that needed a crypto function that was missing. In this case, there is a bug in the kernel config checking, as any such requirement should of enabled the inclusion of the needed crypto function automatically. If this is the case, it would need to be fixed. For the above mysteries, I am NOT closing this as solved. Other users can fall into this trap, whatever it is, and therefore I still have an interest in determining a better answer than I have now. I will post more information here, if I figure out anything more. Any information on this login sequence, and what kernel options it requires, would help, and would be appreciated. |
I did look at /proc/modules, to see what module was being used that was not there before.
I have not spotted anything new, which just deepens the mystery. This invites the reasoning that whatever kernel option change made the difference it was not one that makes a module. I shall be looking over that modules list 2 or 3 more times, and much more carefully. I am having trouble believing that reasoning. |
The problem was that login would hang.
The biggest red flag was that DBUS had an outstanding message to login1, and when DBUS finally timed out, the login would complete. The hang was due to a DBUS message not getting a response, and the sender was hung until DBUS timed out and returned some error response. That message must not have been necessary for login to proceed, because the login did finally complete even after the time out. I have not determined why the target process did not respond to the message. It is soley due to the kernel, as all tests are performed on the same installation with only a different kernel selected by LILO. The huge kernel would succeed, but is carrying around junk and produces junk messages in dmesg. ( junk = stuff that I do not use and would never use, like RAID, hotplug, and lots of old ISA cards ) |
Maybe related:
With elogind came a dependency on cgroups. You dont actually have to enable any of them but you need the option enabled. If not you get a hang/stall on login. |
Verified.
elogind requires Kernel option: CONFIG_CGROUPS I also enabled CGROUP CPU, just to have it do something. In the menuconfig this is marked "When in doubt, N" which is likely to catch more victims. Most everybody is going to need to login. This kernel option is NEW, which makes it difficult to understand why elogind would make it a requirement already. I doubt anyone is going to fix anything, regardless of who I notify, so this thread is likely the only rescue for other people who fall victim to this "gotcha". I have tried before to get a single line of warning included, and I only got weeks of frustration. It would help the most if the kernel menuconfig would include the line "Required by elogind, which many distributions are using". As the distributions that are using elogind are not going to be modifying the kernel configs with customized warnings, there is very little that they can do. A distribution could put into their kernel source package, a file with the CONFIG settings that they know are required for the standard packages and demons that they use in their distribution. |
Thank you for your attention.
Thank you to rogan for a timely answer. I have no idea how rogan knew that. |
Thank you for your attention.
Thank you to rogan for a timely answer. I have no idea how rogan knew that. |
Glad it helped :)
I think found out about it here on LQ after having issues like yours sometime during 15.0 development cycle. |
Quote:
Quote:
|
Thank you rogan. I did try searching LQ, several times, but that search system either finds a dozen irrelevant things, or if you try to add more terms,
it just returns that it could not find any relevant results. ( For those that feel they have a lecture on how to use the LQ search, DO NOT put it here. Put that somewhere where others can find it when they need it. (Again, put any HOWTO search lecture in a thread on HOW TO SEARCH, not here. This thread is closed, I am done with it. ). The Slackware config files do not have any explanation for why those options were selected or disabled. It is little help at all in diagnosing a problem like this, which I know since I was reading the Slackware supplied config files for 2 weeks looking for a clue. It was a generic kernel config that I started with years ago. I have every reason to believe it would ended up exactly the same. There was no reason for me to keep CGROUPS as it looks like a solution to a problem that I do not have. It looks a processor grouping like they use on those massively parallel supercomputers, to allow one user to have a bunch of processors, and some other user to have another bunch. And the config help, "If in doubt, No". There was no reason for me to suspect that there was already a system executable that was dependent upon it. I do have kernel options enabled that I don't think I should need, but at least they look like something (from the kernel option help) that might be used by some program I might use, maybe. The support file I described would require, at minimum: 1. The name of the required kernel option. 2. What feature of the distribution is known to require that kernel option. For instance, if it turns out that KMail is the feature that requires a particular kernel option, that does not require me to enable that kernel option, because I am not going to be running KMail. I am quite sure that line of warning, that I doubt would get added anyway, would be reworded to whatever truth that the kernel people consider to be true. You are free to argue it with them, as I will not. This thread is CLOSED. |
All times are GMT -5. The time now is 09:57 PM. |