[SOLVED] My Dell D620 laptop hangs/freezes randomly on slackware64 14.2
SlackwareThis Forum is for the discussion of Slackware Linux.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Yeah laptop is definitely dodgy; I plugged the drive to another computer and could boot on it and use it fine. Unfortunately that other computer is amd-based so the intel_idle cstate should be meaningless.
I managed to freeze the laptop without its drive from the liveUSB, which makes things even weirder. I'm testing RAM again now. I'm still really puzzled by the processor.max_cstate having no effect.
In post #11 I advised you to disable "Device-Initiated Power Management" for your HardDrive in the BIOS settings, have you tried that?
The problem you report with your lilo kernel boot parameters issue, not passing the processor.max_cstate=1 parameter, is indeed puzzling. I just tried it now and it works on a standard 14.2 Slackware installation (no multilib).
According to this post: https://access.redhat.com/articles/65410
I also added idle=poll
My lilo.conf append line (note that I removed the space before vt.default_utf8=0):
processor.max_cstate is listed as supported in the kernel documentation: https://www.kernel.org/doc/Documenta...parameters.txt
and it doesn't look to be dependent on the CPU pm driver (intel-pstate or acpi-cpufreq)
I just checked, on this older Atom CPU where I did the tests I'm using acpi-cpufreq - inte p_state works only on new processors:
If on your system intel-pstate is loaded by the kernel, you can disable it by adding the intel_pstate=disable kernel boot parameter, acpi-cpufreq will be used instead.
Finally, you should try using an older second-hand HDD drive, a spinning one. You could use some older ones that were destroyed by Windows 7 (notorious for that). Usually a zone at the beginning of the drive - first 20-30GB are filled with bad sectors due to the swap file. If you omit that zone and define the partitions after the first, say 50-100GB, then you can use the drive for long-long time I have a few such drives that are working perfectly.
Here is more info about loaded modules. According to this, I do not have anything regulating CPU frequency and it should stay at the maximum of 1.6Ghz all the time.
/proc/cpuinfo shows nothing on the "power management" line, do you get an entry with your atom CPU?
Quote:
Finally, you should try using an older second-hand HDD drive, a spinning one. You could use some older ones that were destroyed by Windows 7 (notorious for that). Usually a zone at the beginning of the drive - first 20-30GB are filled with bad sectors due to the swap file. If you omit that zone and define the partitions after the first, say 50-100GB, then you can use the drive for long-long time I have a few such drives that are working perfectly.
Thing is, the drive seems out of cause when a) it works good when used as a boot drive on another pc and b) the pc can freeze without any drive inserted and only from the liveUSB
It's been a wile since I played with Core 2 CPUs, retired them all and only running on new Core "i" now, but AFAIK a performance scaling driver was running on these Core 2 systems - acpi-cpufreq should load by default on older Intel CPUs. Note that these drivers are not built as external modules, at least not in the huge kernel I'm usually running, therefore you won't see them with lsmod.
The only very old system I own and still running (doesn't want to die) is an Atom N270, pretty much the same age as your laptop, and on this system I'm using the SpeedStep technology (enabled in BIOS) and have acpi-cpufreq active and handling the CPU clock.
You did disable SpeedStep in you BIOS and maybe that's why you don't have a performance scaling driver activated. I didn't recommend disabling SpeedStep, but was only focusing on the c_states, speculating that the CPU entering a deep c_state might turn the system unstable.
I only advised to disable the Device-Initiated Power Management, which you did and it didn't help.
On idle=poll, I believe it's related to cpuidle and not intel_idle. In one of your older dmesg logs you have:
Code:
[ 0.129008] cpuidle: using governor ladder
[ 0.133007] cpuidle: using governor menu
I just now read your post #16 and realized that your system MB&CPU&RAM is behaving weird, before #16 I was still considering the HDD drive (SSD) ATA-AHCI standard to be a possible cause for the instability.
One thing I'd try is to remove the CD-ROM (I believe it's PATA on your old system), that unit is affecting the SATA controller behavior and could also be the cause of instability. Just a try ... before you dump that system.
P.S. Actually you could temporarily remove everything that's modular and easy to dismount, the WiFi module comes to mind. Try to narrow down the issue, replace the RAM modules too. Run it on batteries only or on DC Adapter only. etc.
(On those older systems you could even replace the CPU - it wasn't soldered on the MB, but had a thin plastic CPU socket)
Last edited by abga; 10-13-2019 at 06:49 PM.
Reason: P.S.
should have an effect no matter if the power driver is intel_idle or acpi_cpufreq. Now my issue is that I do not know how to confirm what cstate parameter is taken into account by acpi-freq.
Usually the kernel will inform you that a chosen parameter could not be considered, the code should be mature enough to handle exceptions properly.
I wish I could help you more with the CPU PM states, but I'm also using the internet to learn about particularities. The last time I did some more investigation was in this thread (you might find the discussion and some of the links I posted useful): https://www.linuxquestions.org/quest...or-4175637326/
You said that your system was working well under Windows XP and that made me to believe that there could be some PM related issues under Linux (Kernel+Drivers) that are causing your system to behave unstable. First I considered the CPU c-states, just because I found references about them causing instabilities on the internet.
In parallel to the c-states I was also thinking to advise you to disable all the PM related features in the kernel (acpi), but I considered the kernel code sound enough and advised you to play with the BIOS first (disable all PM there).
I was looking for Linux experiences on the D620 and found a few interesting links on the internet, out of which only one reports instability issues like yours: http://seclab.cs.stonybrook.edu/seka...untu-d620.html
" Resumption fails once in a while -- may be once in 20 times. It is rare enough that it does not seem to bother me. In fact, it seems no more frequent than random lockups I experience once in a while that require a reboot. (May be once in 10 days.) I wonder if these lockups have any thing to do with bugs reported in Core 2 duo processors --- I bring this up because the lockups leave absolutely no error messages or indication of any thing at all going on at the time of lockup. "
Given the above observation I'd suggest to start your system with the acpi disabled (apic & lapic too). Your new lilo.conf append line should look like:
I mentioned in an older post that I used such an D620 system with a company I worked for, it was loaded with Windows XP, but I remember some of my colleagues choose to have Linux on them and I don't recall any stability issues. I stayed on Windows XP because there were some tools that we were using for testing and configuring the products we developed and deployed, those tools were coded and working better under Windows...
Last edited by abga; 10-14-2019 at 06:31 AM.
Reason: you=your
I'm still reading your links (you are a true detective i've tried searching before posting here and never found those) I tried the append line you mentioned and now I lo longer have the power manager applet in xfce (I guess thats normal if we do not load any power manager program) but my screen is stuck to 1024x768..
We can see that the module "i915.ko" does not load now. How do I get more information about what is happening?
I do not see what links power management and the display driver that would cause such a problem
It's also worth nothing that I do not have any problems with hibernation.
EDIT: this is what happens when i try to load the module manually :
Code:
:/lib/modules/4.4.190/kernel/drivers/gpu/drm/i915# insmod i915.ko
insmod: ERROR: could not insert module i915.ko: Unknown symbol in module
In the Xorg.log you posted, the laptop display is enumerated with the native resolution of 1024x768, have no clue why it doesn't see it as 1280 x 800.
Disabling acpi for good is a radical solution that can have some other unwanted consequences. I advised you to do it just because I wanted to check if it's the acpi that's causing the stability issues. For the moment live with the 1024x768 and just observe your system, if it's stable now we could try some "softer" fixes. One would be to play with the acpi_osi kernel boot parameter (we could try acpi_osi=Linux ), some detailed info:
- look after the acpi_osi= section in: https://www.kernel.org/doc/Documenta...parameters.txt
- and here some additional explanations: https://unix.stackexchange.com/quest...ight-vendor-do
And some HW could not be properly identified/initialized - drivers/modules might need some manual tuning/loading: https://en.wikipedia.org/wiki/Acpi#Architecture
"As ACPI also replaces PnP BIOS, it also provides a hardware enumerator, mostly implemented in the Differentiated System Description Table (DSDT) ACPI table. "
Ok I will use the system some more and see if hangs still happen. Thing is, I cannot reproduce the hangs easily. Now that we suspect either i915.ko or something acpi-related, is there specific actions I can do to stress those parts of my system and make it more likely to hang?
Well, you disabled acpi. I warned you in #23 that this is a radical solution and it might have some other unwanted consequences. There are already 3 such consequences available:
1. The Intel graphic adapter is not recognized by your system - a possible workaround - see: https://www.kernel.org/doc/Documenta...parameters.txt
- in section acpi=, last line "See also.." try adding to your actual lilo kernel append line: pci=noacpi
2. Since your system doesn't recognize the Intel GPU, your X server is defaulting on the vesa frame buffer - just took a look again at your Xorg.log
3. The errors you got form your manual attempt: insmod i915.ko are acpi related, and, expected, since the kernel is instructed to drop the acpi support with the acpi=off parameter.
Just keep observing your system with the acpi turned off and if you don't experience any hangs/reboots, then we could try some "softer" measures, like cancelling the acpi=off and playing with acpi_osi= , although it'll be a sort of trial and error territory and I cannot guarantee any success.
If your system is stable now with acpi=off, I'll suggest to try adding the pci=noacpi first (as advised in #27). The odds for success are higher and if pci=noacpi will sort your graphic adapter, then you have a solution. I mentioned playing with acpi_osi= is guesswork...
Hello, sorry for not having answered quicker, I actually got back to this laptop only today.
Here's what I tested so far :
append="vt.default_utf8=0" -> crash
append="vt.default_utf8=0 noapic nolapic acpi=off" -> no crash no video driver
append="utf8xx acpi=off" -> no video driver no crash
append="utf8xx acpi=off pci=noacpi" -> no vid no crash
append="utf8xx pci=noacpi" -> no boot
append="utf8xx acpi_osi=Linux" -> video ok, no crashes so far
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.