LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Kernel (https://www.linuxquestions.org/questions/linux-kernel-70/)
-   -   Weird kernel behaviour on my Lenovo: has anyone else seen symptoms like this before? (https://www.linuxquestions.org/questions/linux-kernel-70/weird-kernel-behaviour-on-my-lenovo-has-anyone-else-seen-symptoms-like-this-before-4175736758/)

hazel 05-06-2024 09:16 AM

Weird kernel behaviour on my Lenovo: has anyone else seen symptoms like this before?
 
LFS 12 uses the 6.7.4 kernel. I've built plenty of kernels in my time but this one showed the strangest behaviour I've ever seen. I'm doing a bisection to see where this behaviour starts (currently it's somewhere between 6.3.4 and 6.3.7) but I'm floating an initial query here because I'd like to know if anyone has ever observed similar symptoms.

What happens is that the bootloader reports that the kernel has loaded, and then the screen freezes and the computer produces a rhythmic buzzing sound, quite different from the speaker bleeps that you get with POST errors. There are three medium-length buzzes followed by a long one, over and over. It has the feel of a diagnostic code but I haven't found any reference to coded buzz sounds anywhere. There are no visible kernel messages so nothing to get a handle on.

Any ideas?

smallpond 05-06-2024 09:52 AM

I'm unaware of any beep mechanism in the kernel, but I haven't read every line.

More likely it is being produced by the platform firmware (BIOS) so depends on your hardware and what firmware is installed on it. There are other possibilities as well, such as this gem: https://docs.kernel.org/6.8/sound/hd...k-pc-beep.html

Please post your hardware and firmware makes and versions.

For old IBM hardware (Lenovo, these days I guess) 1 long, 3 short means unable to initialize video. For AMI it means bad memory.

hazel 05-06-2024 11:23 AM

Kernel 6.3.6 boots, so the problem lies between 6.3.6 and 6.3.7. I'll need to git clone that branch and switch to git bisect for further info.

The noises are not beeps. They are low-frequency buzzes, and they occur well after POST. The actual boot process (POST->bootloader menu -> kernel load) goes smoothly.

The computer is a Lenovo Thinkstation, basically a laptop in a tower case. It has an external power unit, so no internal fan. The UEFI is American Megatrends. Kernel says
Code:

LENOVO 90BX0018UK/Aptio CRB, BIOS O07KT39AUS 06/18/2014
if that means anything.

hazel 05-07-2024 08:45 AM

I could do with a bit of advice as to the precise command I need to do the clone. I want to clone the branch that has the commits entered between 6.3.6 and 6.3.7, but I first need to know which is the relevant tag. There's plenty of literature online about cloning isolated git branches by using tags as identifiers, so I know the syntax of the command I shall need to enter, but I haven't found anything yet that tells me what the tags mean! For example, does the tag "v6.3.6" mean version 6.3.6 and forward (in which case this is the one I want) or does it mean the twig that ends with version 6.3.6 (in which case the one I actually need is "v6.3.7").

I did a quick fractional clone (just 2%) of the v6.3.6 branch last night to get an idea of the size of the task, and worked out that the full amount of data I shall need to download is about 4 GB. This is more than my whole monthly allowance of peak time downloads! Fortunately my off-peak allowance (midnight to 8 am) is 30 GB per month.

This is clearly a job for the at daemon which I've never used before, but the syntax of the at command seems simple enough. I should be able to put it into the queue in the evening to run at 12:15am the next day. I often used to put on overnight batch jobs like this at work but I've never done one since I retired back in the nineties.

Any help, especially with the tag interpretation will be welcome. Once I have a local clone, I think I can do the actual bisection on my own. In fact I've done it once before and even wrote a blog on it.

Petri Kaukasoina 05-08-2024 12:25 AM

The branch is linux-6.3.y and the tag v6.3.6 corresponds to what is in kernel 6.3.6.

hazel 05-08-2024 12:38 AM

Quote:

Originally Posted by Petri Kaukasoina (Post 6500423)
The branch is linux-6.3.y and the tag v6.3.6 corresponds to what is in kernel 6.3.6.

Brilliant! Then v6.3.7 is the tag I need to clone, because I know that 6.3.6 works. Thanks.

EDIT: Put the clone job into at and it executed after midnight this morning. I'm doing the first bisection build now. Thanks, guys.

hazel 05-14-2024 12:28 AM

It turns out that the halt and the buzzing are two different things: commits around 6.3.7 refuse to boot but don't buzz. The buzz appears later and is caused by something else.

According to git, the commit that causes the halt is this:
Code:

7511a699c2265790ccaf3f3c1c57545405627075 is the first bad commit
commit 7511a699c2265790ccaf3f3c1c57545405627075
Author: Lino Sanfilippo <l.sanfilippo@kunbus.com>
Date:  Thu Nov 24 14:55:34 2022 +0100

    tpm, tpm_tis: Request threaded interrupt handler
   
    commit 0c7e66e5fd69bf21034c9a9b081d7de7c3eb2cea upstream.
   
    The TIS interrupt handler at least has to read and write the interrupt
    status register. In case of SPI both operations result in a call to
    tpm_tis_spi_transfer() which uses the bus_lock_mutex of the spi device
    and thus must only be called from a sleepable context.
   
    To ensure this request a threaded interrupt handler.
   
    Signed-off-by: Lino Sanfilippo <l.sanfilippo@kunbus.com>
    Tested-by: Michael Niewöhner <linux@mniewoehner.de>
    Tested-by: Jarkko Sakkinen <jarkko@kernel.org>
    Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
    Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

 drivers/char/tpm/tpm_tis_core.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

But I've forgotten what I need to do next. How do I find the actual bad code? Or should I now hand this off to the kernel devs?

Petri Kaukasoina 05-14-2024 01:13 AM

This is the bad patch: https://git.kernel.org/pub/scm/linux...57545405627075. You could use it with 'patch -R' to your LFS 6.7.4 kernel source to see if it is the fix.

Have you looked at this document: https://docs.kernel.org/admin-guide/...gressions.html. I guess the kernel developers are not interested fixing bugs in kernel 6.3 or 6.7, so it could be a good idea to see if the latest mainline kernel still has the bug.

hazel 05-14-2024 01:28 AM

Thank you very much for that patch. I now have a lot to do. I want to build a kernel with your patch and test it. I also want to build an unpatched kernel with the tpm options deactivated (since I'm not using secure boot) and see what difference that makes. But what I most want to know right now is where you found that patch, so I can make a note of how to do that for the future.

The answer to your question is that I first noticed this problem after building the current LFS which uses linux-6.7.4. My other systems are Slackware and antiX which both use old stable kernels (antiX is on 6.1 I think, and Slackware-15 is still in major release 5) so you could say that I've been protected!.

Petri Kaukasoina 05-14-2024 01:50 AM

I used the web interface https://git.kernel.org/pub/scm/linux...able/linux.git. In the right up corner 'master' -> select linux-6.3.y and click 'switch'. Then there is a menu under the penguin. Click 'tree'. Browse to drivers/char/tpm/tpm_tis_core.c as that was mentioned in your post. Then click 'log'. There is a list of commits to that file, and the second one is 'tpm, tpm_tis: Request threaded interrupt handler' as in your post. Click that. The third line has a link '(patch)' and that gives the patch in a mailbox format. I guess you could also search it using the commit ID. Or use your git clone locally. Anyway, the URL seems to contain the pathname of the file and the commit ID, so it would be simple to just feed them to a wget command when needed.

hazel 05-14-2024 02:16 AM

Brilliant! With your permission, I'd like to put some of that into my blog on how to bisect a kernel. I wrote it many years ago after a problem with a kernel that crashed on my previous computer and I was intending to use it as a guide to this new problem, but I was really shocked at how sketchily I'd written it. It was a thoroughly careless piece of writing, not up to my usual standards. I'm expanding it now with everything I have learned from this problem.

Petri Kaukasoina 05-14-2024 02:28 AM

Sure.

It seems you can fetch the patch using url
Code:

https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/patch?id=7511a699c2265790ccaf3f3c1c57545405627075
so the pathname of the file is not needed, the commit ID is enough.

Petri Kaukasoina 05-14-2024 02:32 AM

And the web address for 'Linus's tree' (mainline) is https://git.kernel.org/pub/scm/linux...lds/linux.git/. (The addresses in my earlier posts were for the stable kernel.)

hazel 05-14-2024 04:30 AM

Quote:

Originally Posted by Petri Kaukasoina (Post 6501505)
(The addresses in my earlier posts were for the stable kernel.)

That's the one I've been using throughout. I assumed that Linus's tree was for developers only.

hazel 05-16-2024 02:37 AM

Update: This is not going to be as simple as I thought. I downloaded the current LFS kernel (6.7.4) and took a look at the tpm driver file that git had flagged up as a problem. I found that there has been a huge amount of development of this code since the 6.3 series, making it quite impossible to do the simple correction I was hoping for. So I decided to approach the problem from a slightly different angle and test how this recommended kernel would behave if the tpm driver simply wasn't built at all. And whaddyaknow! When I tried to boot it, it just sat and buzzed at me as before :banghead:

I think it is going to take me weeks to get to the bottom of this, especially with my restricted download capacity (though I do at least know now that I can clone small git branches using an overnight at job without crashing through my allocation limit). And I have plenty of time ahead of me.

Of course there is another darker possibility, that I'm no longer capable of configuring my own kernel properly. After all, I shall be 79 this year and it's actually a very long time since I did a kernel configuration. On the old drive I didn't have to, because I had two LFS partitions and built each new LFS out of the previous one. When it came to building the kernel, I just copied over the old config file. I've been doing that for years. But my new drive is smaller and has only one LFS partition, hence the need to configure from scratch. So I am going to do a reality check:

I have installed in Slackware-15 a recent kernel image and its modules from Slackware-current. This kernel is guaranteed to be configured properly! I will make an initrd for it using Patrick's wonderful script and set it up in elilo.conf as an alternative Slack boot. If it boots successfully, then it's my configuration which is at fault. If it halts and buzzes, then it's the kernel.

In that case, at the rate Slackware moves, I shall have at least another year to try and fix things.


All times are GMT -5. The time now is 03:18 AM.