LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Slackware (https://www.linuxquestions.org/questions/slackware-14/)
-   -   Hard lock up when system is inactive (https://www.linuxquestions.org/questions/slackware-14/hard-lock-up-when-system-is-inactive-4175459632/)

cynwulf 04-26-2013 02:51 AM

Hard lock up when system is inactive
 
I don't remember when this problem started but I did not have this problem at all with 13.37 or any other distros before that.

There is no problem if the system is in use, it can keep going for hours. I also play some (native) games on the system and it doesn't happen then.

The problem always occurs when I leave it alone for about an hour and presumably power management kicks in.

It occurs irrespective of whether X is running. Once it happened before log in and the terminal was still visible but completely frozen.

nvidia driver or nouveau makes no difference.

The generic kernel, my own build of 3.2.37 or a 3.8.4 I have been trialling - makes no difference.

Magic sys rq - no effect.

There is nothing in any logfiles that I can see, but suggestions as to where else to look are very welcome.

I would suspect hardware - but in that case the system should be freezing under load (?) and it isn't.

The machine in question is nothing special, it's an old P4M800 based motherboard with a Pentium 4 3.4GHz, 1.5GB RAM and an AGP 7300GT graphics card. The wireless is a PCI BCM4318 (using b43 + the firmware).

I'm running Slackware 14 with fluxbox and no display manager.

Any ideas/pointers much appreciated.

business_kid 04-26-2013 03:15 AM

I would suspect power management. Has it gone into some one of standby/suspend/hibernate? Disable power management and see if it occurs. It's about the only way a pc falls on it's sword doing nothing.

cynwulf 04-26-2013 06:05 AM

I'll have a look at the power management settings in the BIOS again and try to disable as much as possible. Thanks.

Toutatis 04-26-2013 06:10 AM

Is it reachable from another machine with ssh ?

cynwulf 04-26-2013 06:15 AM

I'm not sure as I don't have another machine available to try... but I might be able to get something set up.

Would trying to ssh in be worthwhile considering that not even sysrq works when it locks up?

business_kid 04-26-2013 11:23 AM

I would also scour your window manager for power management settings. On this
Quote:

Would trying to ssh in be worthwhile considering that not even sysrq works when it locks up?
the answer is yes. I'm battling lockups atm and the power off button runs an orderly shutdown, even though the keyboard is dead.

kikinovak 04-26-2013 11:52 AM

I vaguely remember a problem a while back with xscreensaver and the video driver. To be on the safe side, try to uninstall xscreensaver and see if the problem persists.

Code:

# removepkg xscreensaver

edorig 04-26-2013 01:24 PM

caravel: Your problem might be caused by power management. Maybe you computer has some issues with
suspend to RAM or suspend to Disk. I would advise you to back up your important data, and then try the
following.
1) as root type echo disk > /sys/power/state .
This triggers a suspend to Disk. The memory of your computer will be written to the swap partition (which
should be about twice your RAM for this to work correctly), and the computer will power off. Try to bring
back the computer with the power button and see if it hangs during the boot processs.
If the above does not give a hard lock, then try the following.
2) as root type echo mem > /sys/power/state
This triggers a suspend to RAM. The devices of your computer are put on low power, screen is turned off,
processor is stopped, but the RAM remains powered. You try to bring back your computer to working state
by pressing the power button. Again, you may get a hard lock.

The problems with the suspend to RAM are generally caused by the BIOS failing to bring back the power to
the screen, or messing the video memory. If indeed your problems are caused by suspend to RAM, you should
have a look at the kernel documentation, in the power/ directory. See in particular the files:
basic-pm-debugging.txt and s2ram.txt. The ACPI-HOWTO can be also a valuable source of information.

cynwulf 04-26-2013 03:41 PM

Quote:

Originally Posted by business_kid (Post 4939374)
I'm battling lockups atm and the power off button runs an orderly shutdown, even though the keyboard is dead.

I have to hold the power button to force shutdown.

Quote:

Originally Posted by kikinovak (Post 4939390)
I vaguely remember a problem a while back with xscreensaver and the video driver. To be on the safe side, try to uninstall xscreensaver and see if the problem persists.

Code:

# removepkg xscreensaver

Well xscreensaver wasn't running, but removed it anyway.
Quote:

Originally Posted by edorig (Post 4939465)
caravel: Your problem might be caused by power management. Maybe you computer has some issues with
suspend to RAM or suspend to Disk. I would advise you to back up your important data, and then try the
following.
1) as root type echo disk > /sys/power/state .
This triggers a suspend to Disk. The memory of your computer will be written to the swap partition (which
should be about twice your RAM for this to work correctly), and the computer will power off. Try to bring
back the computer with the power button and see if it hangs during the boot processs.
If the above does not give a hard lock, then try the following.
2) as root type echo mem > /sys/power/state
This triggers a suspend to RAM. The devices of your computer are put on low power, screen is turned off,
processor is stopped, but the RAM remains powered. You try to bring back your computer to working state
by pressing the power button. Again, you may get a hard lock.

The problems with the suspend to RAM are generally caused by the BIOS failing to bring back the power to
the screen, or messing the video memory. If indeed your problems are caused by suspend to RAM, you should
have a look at the kernel documentation, in the power/ directory. See in particular the files:
basic-pm-debugging.txt and s2ram.txt. The ACPI-HOWTO can be also a valuable source of information.

Suspend to disk appears to work, shuts down, etc but when I boot up it's no different to a normal boot.

Suspend to RAM does not seem to be supported:
Code:

# echo mem > /sys/power/state
bash: echo: write error: No such device

I tried

Code:

# echo standby > /sys/power/state
Worked fine...

Oddly enough after the second reboot from suspend to disk, I was logging in and it just locked up again... this is the first ever time it's froze when I've been sat in front of it.

Thanks for the help - much appreciated. Will have a look at BIOS settings next. Not sure when I'll be able to try to ssh into the box - it will involve either borrowing a laptop... or installing on my better half's windows PC...

mrclisdue 04-26-2013 08:36 PM

...you could simply ping from the better half's windows box, or there are ssh-in-a-browser sites you could try from, without installing ssh...

Also, I had a similar issue back in the 12.2 days with a single older box, and never did find a solution, however, if you have access to another monitor, you may have better results.

Also, from a terminal, see if xset -dpms accomplishes anything.

cheers,

mlslk31 04-26-2013 09:26 PM

Here at work, I use PuTTY on Windows, which you can get from here:

http://www.chiark.greenend.org.uk/~sgtatham/

BTW, his puzzle collection compiles on slackware-current without needing non-Slackware packages.

Otherwise, I'm staying out of this one. I have multiple 32-bit PCs that have your issue. I call it "console blanking mixes with DRM and power management, causing bad results." On one PC where DRM is required in order to run X11, I compiled suspend/resume out of the kernel, but still have to be careful because the DRM power blanking absolutely insists on putting the monitor to sleep. The setterm command does nothing to help. On the PC that does not require DRM, I ripped out DRM, the console framebuffer, and suspend/resume; I've had no problems to date. Anyway, where the oopses happen, they happen either when the console is blanking, or when the console is being awakened.

On that front, there's been a tty locking fix and a DMI fix around kernel 3.8.8, so things like this might go away, might not. That's my two cents.

AceofSpades19 04-26-2013 09:41 PM

Quote:

Originally Posted by caravel (Post 4939550)
Suspend to disk appears to work, shuts down, etc but when I boot up it's no different to a normal boot.

Maybe you have already done this - but did you update your lilo.conf to include
Code:

image = /boot/vmlinuz
  root = /dev/sda1
  append=" resume=/dev/sda4"
  label = Linux
  read-only

that append="resume" line? you need that to point to your swap space, otherwise suspend to disk will not work

edorig 04-27-2013 08:01 AM

caravel: I have found reports of system crashes after hibernate/resume for a system
with PM4800 motherboard like yours at https://bugs.launchpad.net/ubuntu/+s...ux/+bug/459743.
There is a line: ACPI Warning: Incorrect checksum in table [OEMB] - 6D, should be 5E 20090521 tbutils-246 in the dmesg output that is suggesting a problem with the BIOS.
Maybe you should try to do a grep ACPI /var/log/messages ; grep ACPI /var/log/syslog ; dmesg | grep ACPI
to see whether a similar warning appears.

cynwulf 04-28-2013 12:00 PM

Quote:

Originally Posted by mrclisdue (Post 4939674)
Also, from a terminal, see if xset -dpms accomplishes anything.

I haven't had a chance to fiddle with that, but if the crashes continue I'll try it next, thanks.

Quote:

Originally Posted by mlslk31 (Post 4939690)
On that front, there's been a tty locking fix and a DMI fix around kernel 3.8.8, so things like this might go away, might not. That's my two cents.

I've just upgraded the kernel from 3.8.4 to 3.8.10 and so far the lock up has not occurred, but I'll need to see how it goes before assuming the problem is gone.

Quote:

Originally Posted by AceofSpades19 (Post 4939696)
Maybe you have already done this - but did you update your lilo.conf to include
Code:

image = /boot/vmlinuz
  root = /dev/sda1
  append=" resume=/dev/sda4"
  label = Linux
  read-only

that append="resume" line? you need that to point to your swap space, otherwise suspend to disk will not work

Thanks, I don;t use suspend to disk, but I'll keep that in mind.
Quote:

Originally Posted by edorig (Post 4939970)
caravel: I have found reports of system crashes after hibernate/resume for a system
with PM4800 motherboard like yours at https://bugs.launchpad.net/ubuntu/+s...ux/+bug/459743.
There is a line: ACPI Warning: Incorrect checksum in table [OEMB] - 6D, should be 5E 20090521 tbutils-246 in the dmesg output that is suggesting a problem with the BIOS.
Maybe you should try to do a grep ACPI /var/log/messages ; grep ACPI /var/log/syslog ; dmesg | grep ACPI
to see whether a similar warning appears.

I much appreciate your looking into this. I have searched logs for any ACPI related errors but there were none.

I also reset the optimised BIOS defaults and then configured the BIOS to disable the devices I don't use. So far so good.

Thanks again.

cynwulf 05-01-2013 02:46 AM

There have been no more hard lock ups since the upgrade to 3.8.10 - marked as solved, thanks to all.


All times are GMT -5. The time now is 04:23 AM.