LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Distributions > Slackware
User Name
Password
Slackware This Forum is for the discussion of Slackware Linux.

Notices


Reply
  Search this Thread
Old 10-23-2019, 03:29 PM   #46
abga
Senior Member
 
Registered: Jul 2017
Location: EU
Distribution: Slackware
Posts: 1,275

Rep: Reputation: 673Reputation: 673Reputation: 673Reputation: 673Reputation: 673Reputation: 673

@Twigster

I stated that I'm 90% sure the cause of your hang/reboot problems is acpi and/or PM(calls) related. I'm running out of ideas and we pretty much exhausted all the possible fixes to tame your system.
I'm also "running out" of knowledge with respect to the acpi system and your next best bet for further investigation would be to engage with the kernel devs -> open a bug report @ kernel.org. You could provide the acpi tables from your /sys/firmware/acpi/tables/ and ask them to inspect/check them.

I believe it's time to consider the other 10% and by going over this whole thread again, I realized that the only period when your system was truly stable was when you disabled acpi for good with the kernel parameter acpi=off. As a result your video card wasn't recognized by the system and the i915 module not loaded, the standard vesa frame buffer driver was used instead.
Apparently there are "well known" stability issues with the i915 driver and I'd suggest to focus on it for the moment.
Cancel all the acpi workarounds in your lilo append line, get it back to the default:
append="vt.default_utf8=0"
And lecture & try the following bug reports and fixes, even if some of them are related to newer HW & kernel versions:

- ARCH Linux has a whole section of i915 fixes:
https://wiki.archlinux.org/index.php...h_intel_driver

- this manjaro user used a chain of kernel parameters (couldn't find them in the official kernel doc - https://www.kernel.org/doc/Documenta...parameters.txt ):
https://forum.manjaro.org/t/i915-gpu-hang-solved/37200

- here are some bug reports related to i915 (plenty more if you search on the net). In some cases whole system freeze & reboot with no error logging is reported, pretty much like in your case:
https://bugs.freedesktop.org/show_bug.cgi?id=102586
https://bugs.launchpad.net/ubuntu/+s...x/+bug/1535048

- I took a look at the i915 module parameters (modinfo i915) in search for some interesting ones that could help with your issue. The following are the only ones that I found useful and the first two are already enabled by default. The third one could be useful for extra logging, but then your system hangs and there's no logging at all ...
Code:
parm:           reset:Attempt GPU resets (default: true) (bool)
parm:           enable_hangcheck:Periodically check GPU activity for detecting hangs. WARNING: Disabling this can cause system wide hangs. (default: true) (bool)
parm:           verbose_state_checks:Enable verbose logs (ie. WARN_ON()) in case of unexpected hw state conditions. (bool)
 
1 members found this post helpful.
Old 10-23-2019, 05:20 PM   #47
Nille_kungen
Member
 
Registered: Jul 2005
Posts: 508

Rep: Reputation: 178Reputation: 178
My Dell 620 started hanging right before the nVidia Quadro died.
It was from the time when nvidia had problems with their chips dying in laptops.
 
Old 10-24-2019, 02:20 PM   #48
abga
Senior Member
 
Registered: Jul 2017
Location: EU
Distribution: Slackware
Posts: 1,275

Rep: Reputation: 673Reputation: 673Reputation: 673Reputation: 673Reputation: 673Reputation: 673
@Twigster

In #46 I got a little confused about the i915 parameters, provided as actual kernel boot parameters, described in the manjaro post:
https://forum.manjaro.org/t/i915-gpu-hang-solved/37200
My apologies.

In the Slackware provided kernel the i915 driver is modular and the way to pass these options to the module is by creating a /etc/modprobe.d/i915.conf file with the following content:
Code:
options i915 modeset=1 enable_rc6=1 enable_fbc=1 enable_guc_loading=1 enable_guc_submission=1 enable_huc=1 enable_psr=1 disable_power_well=0 semaphores=1
 
Old 10-25-2019, 08:19 PM   #49
Twigster
Member
 
Registered: Oct 2019
Location: France
Distribution: Slackware64 14.2
Posts: 34

Original Poster
Rep: Reputation: Disabled
Hello, sorry again for the late reply.

I'm testing and editing the post as I go.

First, the Arch Linux page.
It mentioned intel_idle.max_cstate, we already knew that intel_idle does not run on my CPU.

I did not try the X server ideas, because I thought that when X crashes/freezes, I could still use ctrl alt f1 to go to a console. But with my issue, the shortcut does not do anything.

I could not understand most of the i915 options. The only one I could figure, enable_rc6, is already at 0 according to systool. I could confirm that reset, verbose_state_checks, and enable_hangcheck are all set to "Y" in systool.

The manjaro user has graphical issues that somehow still log data. So I gave it a go, and my computer still froze. dmesg only complained that some of the i915 parameters didnt exist, and everything else looked as before.

On the gentoo bug, a file named kernel.log is mentioned. is /var/log/messages its equivalent on Slackware? Also I can see kernel oops messages, so unfortunately my system does not behave quite like that.

On the launchpad bug, unfortunately the TLP suggestions require intel_pstate and that driver needs a sandy bridge or more recent CPU.
I do not understand if intel_pstate and intel_idle mean the same thing.

I learnt here about cpu governors and found that my cpus were in ondemand mode. I did not understand yet how to make these settings persistant over a reboot. Unfortunately, just echoing performance into the govs (without rebooting or anything) didnt stop the computer from freezing. I don't even know if pstates/acpi are even related to selected governors.

Last edited by Twigster; 10-25-2019 at 09:34 PM.
 
Old 10-25-2019, 08:37 PM   #50
abga
Senior Member
 
Registered: Jul 2017
Location: EU
Distribution: Slackware
Posts: 1,275

Rep: Reputation: 673Reputation: 673Reputation: 673Reputation: 673Reputation: 673Reputation: 673
On the Arch troubleshooting section:
"the X server ideas" describes: "Some issues with X crashing, GPU hanging, or problems with X freezing, "
If your GPU is hanging, cltr alt f1 won't be of any help, I guess...
Then you have a workaround for "Kernel crashing w/kernels 4.0+ on Broadwell/Core-M chips", setting i915.enable_execlists=0. Try adding:
Code:
options i915 enable_execlists=0
in /etc/modprobe.d/i915.conf

I'd suggest to try anything that is mentioning GPU/System hang or you believe that could cause a hang. Your issue is peculiar enough because you don't get anything in the kernel log, no info/clue. Be a little more flexible&creative

Last edited by abga; 10-25-2019 at 08:46 PM. Reason: typo
 
Old 10-26-2019, 12:05 AM   #51
Richard Cranium
Senior Member
 
Registered: Apr 2009
Location: Carrollton, Texas
Distribution: Slackware64 14.2
Posts: 3,505

Rep: Reputation: 1836Reputation: 1836Reputation: 1836Reputation: 1836Reputation: 1836Reputation: 1836Reputation: 1836Reputation: 1836Reputation: 1836Reputation: 1836Reputation: 1836
@Twigster, do you have a second machine that you could use to ssh into the laptop?
 
Old 10-27-2019, 05:49 PM   #52
Twigster
Member
 
Registered: Oct 2019
Location: France
Distribution: Slackware64 14.2
Posts: 34

Original Poster
Rep: Reputation: Disabled
@Richard : I didn't think of doing that. I opened a ssh terminal then went and made the laptop hang and I got a putty fatal error ssaying "network error software caused connection abort", and then I couldnt connect back.


@abga : I do not know if I should try all options together or one at a time.
I tried this in one go and it still hung:
Code:
~# cat /etc/X11/xorg.conf.d/20-intel.conf 
Section "Device"
	Identifier "Intel Graphics"
	Driver "intel"
	Option "NoAccel" "True"
	Option "DRI" "False"
	Option "AccelMethod" "sna"
EndSection
~# cat /etc/lilo.conf | grep append
append="vt.default_utf8=0 i915.enable_execlists=0"
Maybe I'll get more creative tomorrow =)

Last edited by Twigster; 10-27-2019 at 05:52 PM.
 
Old 10-27-2019, 06:55 PM   #53
Richard Cranium
Senior Member
 
Registered: Apr 2009
Location: Carrollton, Texas
Distribution: Slackware64 14.2
Posts: 3,505

Rep: Reputation: 1836Reputation: 1836Reputation: 1836Reputation: 1836Reputation: 1836Reputation: 1836Reputation: 1836Reputation: 1836Reputation: 1836Reputation: 1836Reputation: 1836
Quote:
Originally Posted by Twigster View Post
@Richard : I didn't think of doing that. I opened a ssh terminal then went and made the laptop hang and I got a putty fatal error ssaying "network error software caused connection abort", and then I couldnt connect back.
Ok, that's a very hard laptop hang; I've had X go crazy and then respond to neither keyboard nor mouse but the machine was still more-or-less operational via my ssh login. (Normally less)

Umm, I've got nothing useful to add.
 
Old 10-27-2019, 07:16 PM   #54
abga
Senior Member
 
Registered: Jul 2017
Location: EU
Distribution: Slackware
Posts: 1,275

Rep: Reputation: 673Reputation: 673Reputation: 673Reputation: 673Reputation: 673Reputation: 673
@Twigster

What Richard Cranium suggested, to connect remotely through secure shell, is useful for troubleshooting a system that has the console dysfunctional (could be also due to the graphic driver). I haven't considered this approach, mainly because in your OP you stated "the hardware network switch does nothing", which (together with the other details you provided) led me to believe that the whole system is frozen/crashed.

With my statement "Be a little more flexible&creative", I just wanted to propose to be more flexible in your understanding (situation&implications) and creative in your approaches. Again, it's a really weird situation you have there and it's worth to try whatever workarounds you find related (even distantly) to your issue. At least until you still have time & patience with it
(and don't have one of these handy: https://en.wikipedia.org/wiki/Sledgehammer )

I'd approach the workarounds in a more sequential fashion, not trying all of them together at once. You'd be able to identify the ones that do something from the others with no effect.
TBH, I had some hopes from the i915.enable_execlists=0, sorry to hear that it doesn't help.

Besides, you don't need to provide the driver options as kernel boot parameter if the driver is built modular. It will work, the kernel boot parameters will be passed for both built-in and modular drivers but there are easier ways to achieve that and if the driver is modular, a reboot is not always required.
Now, since the i915 driver is built modular in the Slackware kernel, you could boot clean and then try all these module options:
- statically, with the help of /etc/modprobe.d/i915.conf, like I suggested in #48 & #50 and only unload & reload the module.
Code:
/sbin/rmmod i915
# then add the options you want to try to /etc/modprobe.d/i915.conf and reload the module
/sbin/modprobe i915
- (an even simpler method) manually providing the module parameters. First unloading the i915 module and reloading it with the preferred parameters:
Code:
/sbin/rmmod i915
/sbin/modprobe i915 enable_execlists=0
# or - the manjaro thread stuff:
/sbin/modprobe i915 modeset=1 enable_rc6=1 enable_fbc=1 enable_guc_loading=1 enable_guc_submission=1 enable_huc=1 enable_psr=1 disable_power_well=0 semaphores=1
(you should also try that chain of module options form #48 (originally from the manjaro thread))
To check if/how the module was loaded - inspect dmesg. For identifying what parameters & values are loaded, use:
Code:
# the proper tool
/usr/bin/systool -v -m i915
# a "hack"
grep -H '' /sys/module/i915/parameters/*
Good Luck!
 
Old 10-27-2019, 07:38 PM   #55
allend
LQ 5k Club
 
Registered: Oct 2003
Location: Melbourne
Distribution: Slackware-current
Posts: 5,309

Rep: Reputation: 1938Reputation: 1938Reputation: 1938Reputation: 1938Reputation: 1938Reputation: 1938Reputation: 1938Reputation: 1938Reputation: 1938Reputation: 1938Reputation: 1938
Looking at that dmesg log:
Quote:
[ 156.315043] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[ 156.315049] ata2.00: ST_FIRST: !(DRQ|ERR|DF)
[ 156.315063] ata2.00: cmd a0/00:00:00:08:00/00:00:00:00:00/a0 tag 0 pio 16392 in
opcode=0x4a 4a 01 00 00 10 00 00 00 08 00res 50/00:01:00:08:00/00:00:00:00:00/a0 Emask 0x2 (HSM violation)
[ 156.315068] ata2.00: status: { DRDY }
[ 156.315103] ata2: soft resetting link
[ 156.542282] ata2.00: configured for UDMA/33
[ 156.553518] ata2: EH complete
I think that hard drive is headed for the ewaste pile.
If you have any important data, back it up!
 
Old 10-27-2019, 08:06 PM   #56
abga
Senior Member
 
Registered: Jul 2017
Location: EU
Distribution: Slackware
Posts: 1,275

Rep: Reputation: 673Reputation: 673Reputation: 673Reputation: 673Reputation: 673Reputation: 673
@allend

I've noticed those and suggested to check the HDD connection through posts #8 & #9. Then in post #16 OP managed to crash the system without a HDD connected (booting from USB).
Are you sure those "* exception Emask * frozen", "soft resetting link" and " HSM Violation" are signs for a failing HDD? Maybe they are caused by the fact that the system is not operating in AHCI mode. OP is not able to set the SATA Mode - mentioned in post #12.

The only info I could find about these exceptions - doesn't necessarily describe a failing HDD:
https://unix.stackexchange.com/quest...-to-solve-them
https://askubuntu.com/questions/1339...rors-dangerous
HSM Violation:
https://www.kernel.org/doc/htmldocs/...atHSMviolation
 
Old 10-27-2019, 09:24 PM   #57
allend
LQ 5k Club
 
Registered: Oct 2003
Location: Melbourne
Distribution: Slackware-current
Posts: 5,309

Rep: Reputation: 1938Reputation: 1938Reputation: 1938Reputation: 1938Reputation: 1938Reputation: 1938Reputation: 1938Reputation: 1938Reputation: 1938Reputation: 1938Reputation: 1938
@abga - No, I am not sure. But random hangs accompanied by log messages about hard drive errors preceding total hard disk drive failure I can confirm from personal experience on more than one occasion.

As an aside, I have a nephew who is good at the local equivalent of dumpster diving. Resuscitating old laptops has been a bit of a hobby. These laptops have generally been running Windows, and the process of cleaning them out and installing updates is a good disk stress test. Hard disk problems would explain why they were discarded. If the hardware makes it worthwhile, installing a new SSD gets a usable laptop with significant performance gains.
 
Old 10-28-2019, 09:46 AM   #58
Twigster
Member
 
Registered: Oct 2019
Location: France
Distribution: Slackware64 14.2
Posts: 34

Original Poster
Rep: Reputation: Disabled
Hello allend,

The laptop I own has had its original HDD replaced by a used kingston 64GB ssd in 2012 or 2013 (can't remember). It was fully functional (but slow) before the disk swap, and then didn't have any faults during its time on Windows XP.
The disk may be toast, but I've got nothing of value to lose on it.
Besides, I managed to successfully boot the drive on another PC without issues.

@abga : thanks for helping me understand the module subtilities. When using systool, how do you know if the module is loaded? refcount I guess?
In any case, my system didn't like rmmod -f i915 (because module was in use) and screen went black. CTRL ALT backspace did nothing. I could do like that but at that point rebooting is less bothersome


As my modus operandi to reproduce the problem is to open a PDF in okular, I will attempt to strace it thru ssh and see if there is a pattern.

Here are two strace results on the okular process that I'm interacting with when the laptop hangs :
trace 1
trace 2

I would like to gather a trace of the kernel itself, if such a thing is possible.

Last edited by Twigster; 10-28-2019 at 10:52 AM.
 
Old 10-28-2019, 12:05 PM   #59
abga
Senior Member
 
Registered: Jul 2017
Location: EU
Distribution: Slackware
Posts: 1,275

Rep: Reputation: 673Reputation: 673Reputation: 673Reputation: 673Reputation: 673Reputation: 673
@Twigster

Using lsmod (lsmod | grep i915) is the easiest way to check if a module is loaded.
Well, if X is started it's obvious that the i915 module is in use. All my suggestions/instructions about the i915 module should be executed on console without X running.

If you want to have dmesg (kernel log) constantly showing updates, you could open a tty (text mode console) and dedicate it for this purpose, run:
Code:
/bin/dmesg -wT
It might be useful to open a SSH from a remote system and monitor the kernel messages (again with /bin/dmesg -wT ), maybe you can catch something interesting during the crash, something that isn't written in the logs.

Ctr+Alt+Backspace should end the X session (shutdown X Server) in a normal situation.
If you have an active SSH connection to your system, and the system isn't frozen (totally), you could kill X (as root):
Code:
/usr/bin/killall xinit
Couldn't find anything interesting in your okular traces.

I have no experience in debugging the kernel and I believe you have to prepare it first, configure - enable some extra debugging options and recompile it.
Some lecture:
https://ownyourbits.com/2018/05/09/d...-linux-kernel/
https://elinux.org/Kernel_Debugging_Tips
https://www.kernel.org/doc/html/v4.1...ools/kgdb.html
 
Old 10-28-2019, 04:43 PM   #60
enorbet
Senior Member
 
Registered: Jun 2003
Location: Virginia
Distribution: Slackware = Main OpSys for decades while testing others to keep up
Posts: 2,483

Rep: Reputation: 2528Reputation: 2528Reputation: 2528Reputation: 2528Reputation: 2528Reputation: 2528Reputation: 2528Reputation: 2528Reputation: 2528Reputation: 2528Reputation: 2528
I mentioned earlier that I rarely use a laptop and the number one reason why is HEAT! I wouldn't have mentioned this until I saw Nille Kungen's post regarding failure of nVidia Quadro which is almost certainly a heat issue. So I web searched and see that Dell 620s are notoriously HOT! and mostly from two issues - 1) A constriction point that collects dust quickly , and 2) Horrible glob of thermal paste factory installed by default. Incidentally heat can also cause hdd hangs. I've seen posts by people who logged common CPU temps of 100C during very light loads. This is unacceptable and dangerous to hardware and software by extension.

I strongly recommend you load lmsensors and run the setup "sensors-detect" if you haven't already. Since yours does not apparently have a separate chip for graphics but is combined in a single CPU/GPU chip it is extremely likely your temps are extreme and quite possibly the cause of hard hangs.

It is essential to clear air passages. It is best to have as thin a film of thermal grease as possible, not thick, not baked, a thin greasy film with hard physical contact between source (chips) and heatsink(s). It takes an hour or so but it isn't "rocket surgery" Here .... https://www.youtube.com/watch?v=Bm7KWt87eT0 ... is a good example of what it takes. Properly done D620s commonly do not exceed 60C with heavy loads like kernel compiling. Heat is not merely uncomfortable. It is the enemy of electronics.
 
1 members found this post helpful.
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Problems with HP 3500c scanner and Dell D620 laptop runing Ubuntu 9.10 wa3fkg Linux - Laptop and Netbook 1 12-03-2009 10:37 AM
SuSE on Dell D620 laptop cvzyl SUSE / openSUSE 4 04-08-2008 07:53 AM
FC5 and Laptop Dell Latitude D620 zillah Linux - General 3 11-07-2006 12:51 AM
Dell d620 graphics support bagpussnz Linux - Laptop and Netbook 2 05-10-2006 04:04 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Distributions > Slackware

All times are GMT -5. The time now is 07:48 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration