LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Software (https://www.linuxquestions.org/questions/linux-software-2/)
-   -   Kernel update did something weird to X. drm issue? (https://www.linuxquestions.org/questions/linux-software-2/kernel-update-did-something-weird-to-x-drm-issue-4175620797/)

hazel 01-03-2018 11:21 AM

Kernel update did something weird to X. drm issue?
 
1 Attachment(s)
I wasn't sure where to post this. I think general software seems the most appropriate.

A little background. The distro is AntiX, recently upgraded to AntiX-17 with no problems. It was still running on the old AntiX-16 kernel (4.4.10) so I installed the latest kernel in the Debian repository (4.9.0), then chose exit/reboot. Immediately I noticed something weird: the video went mad with coloured lights. Then it shut down and rebooted.

Everything on the reboot looked normal initially. There were no error messages and slim (the display manager) came up normally. I logged in and the video went black. And that was that. X was borked, no display, no keyboard.

I rebooted into single-user mode and copied over the Xorg logs for the abnormal shutdown and startup. The tail of the shutdown one is attached. As you can see, there was a segfault in X.

For the new startup, it got as far as registering the mouse normally, then printed "Backtrace:" and went dead. Another segfault, I guess.

As it clearly has something to do with the new kernel, I am tentatively thinking "drm" because that's where the kernel and X intersect. And as the problem only manifests after the graphical login, there must be some transfer of ownership of the hardware at that point.

If anyone can suggest any further tests, please do, but remember that this is a different machine upstairs so I can't carry out tests instantly.

Update: I booted again, got the slim login page, but went to console instead (keyboard still works at this stage). Looked at dmesg, nothing unusual. Stopped slim by using its initscript in /etc/init.d, then tried startx. Screen went black, everything dead as before. Next time I'll try startx as root, see what that does. I'm shutting down for the day.

_roman_ 01-03-2018 03:02 PM

Looks to myself more a glibc issue and how your x server was build (most likely a faulty build)
off topic, just asking because i am a nerd: Any reason why it reference i386? and not i686 or amd64?

I think those two saved in a file will help us the most to pinpoint the issue.
Quote:

cat /var/log/Xorg.0.log
Quote:

dmesg
TTY keyboard should always work. That is "init 3" in the old days. And that has nothing to do with your X server issues. usually you see the most what the X server does in /var/log/Xorg.0.log

I would also consider going back to kernel 4.4 branch. I have had recently also my share of bad kernels for quite a while. Older long term kernels are usually more mature and usable

as i checked a few hours kernel.org => kernel 4.9.73 or 4.9.74 is the latest long term one. 4.9.0 is very very old in my point of view

hazel 01-04-2018 02:01 AM

Quote:

Originally Posted by _roman_ (Post 5801284)
Looks to myself more a glibc issue and how your x server was build (most likely a faulty build)
off topic, just asking because i am a nerd: Any reason why it reference i386? and not i686 or amd64?

This is a very old computer. It doesn't do 64-bit.

Quote:

TTY keyboard should always work. That is "init 3" in the old days. And that has nothing to do with your X server issues. usually you see the most what the X server does in /var/log/Xorg.0.log
TTY keyboard only works if you can get to a TTY. But once the keyboard has failed in X, you can't go back to a console because you need a working keyboard to do that! I can switch to a console before the failure, but not afterwards.

Quote:

I would also consider going back to kernel 4.4 branch. I have had recently also my share of bad kernels for quite a while. Older long term kernels are usually more mature and usable.
Yes, I'll try that. I also want to try getting X up as root under the new kernel. Obviously you can't work like that on a permanent basis, but if it bypasses the X failure, then we have a permissions issue of some kind involving hardware devices. And that would switch the spotlight onto udev. AntiX-16 uses a systemd-free version of udev, AntiX-17 uses eudev. I don't know which one I have right now after my grand update but I definitely need to find out.

There's nothing abnormal about the Xorg logs except the way they end, and I've already described that.

anticapitalista 01-04-2018 03:05 AM

Did you follow the instructions here? re udev/eudev

https://www.antixforum.com/forums/to...de-to-antix17/

hazel 01-04-2018 06:40 AM

1 Attachment(s)
I thought I had installed eudev, but apparently I hadn't. I get absent-minded these days. So I installed it and removed udev and libudev1. Unfortunately that hasn't solved the problem.

I've just tried stopping slim and using startx from root, following my hunch that this was some kind of hardware permissions problem, but that doesn't work either.

The weird thing is that if I use GRUB's advanced options menu to boot my old kernel, X works just fine and I can log in normally.

When it fails at slim login, the failure occurs just after the mouse pointer appears. With startx, I don't even see a mouse pointer.

I'm attaching a complete failed Xorg.0.log to this post. Maybe someone will see something that I can't.

anticapitalista 01-04-2018 08:08 AM

A long shot - install, if you haven't already, xserver-xorg-legacy.
Otherwise I would just keep to using the older, working kernel. Maybe the hardware doesn't like anything newer than 4.4

hazel 01-04-2018 10:18 AM

Well, I finally found out what it was. But I still don't know why it made such a difference.

I had booted from the old kernel and was in synaptic, checking for xserver-xorg-legacy. Turns out I had this installed, so I thought "End of the road! I've just got a rubbish kernel. Let's get rid of it." So I went to the kernel section and I noticed that there were three versions of that kernel and I had installed the non-pae one. I did a quick check of proc/cpuinfo and it turns out I do have pae, so I removed that kernel and installed a pae version instead. And now everything works.

But why should it matter to X whether the kernel uses pae?

Rocdufer 01-08-2018 12:22 AM

Before directly suspecting X, I would suspect the driver of the graphics card. That driver is a kernel module. When the used driver does not handle the graphics card correctly, after entering the graphic mode unexpected machine reboots can occur, for example, when the screensaver is launched. If your machine is 32 bits, you can expect the graphics card to be 32 bits too.


All times are GMT -5. The time now is 09:53 AM.