LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Slackware (http://www.linuxquestions.org/questions/slackware-14/)
-   -   Random system hang ups (http://www.linuxquestions.org/questions/slackware-14/random-system-hang-ups-753376/)

w1k0 09-07-2009 02:42 PM

Random system hang ups
 
My machine is ThinkPad T60. I use it with Slackware 13.0 with my custom generic smp kernel 2.6.29.6 (the only difference between original and my custom kernels is Tuz in the first and Tux in the second).

My system hangs up one or two times a day: the screen freezes, the keyboard locks and Ė if I play some music Ė sound system starts to play one sound in a loop. I use Window Maker. When I encountered hung ups I ran usually a few xterms, two Firefoxes (on two user accounts) and MLDonkey 3.0.0 taken from SlackBuilds.org. Sometimes I used also moc 2.4.3 (music on console) taken from SlackBuilds.org too.

I inspected /var/log/messages trying to determine what happened when the system hung up but I found nothing relevant. The system hangs up usually when I switch from one applicationís window to another but it isnít a rule.

In this example I switched applications windows:

Code:

Sep  5 21:34:52 home6 acpid: client 4138[0:100] has disconnected
Sep  5 21:34:52 home6 acpid: client 3624[0:100] has disconnected
Sep  5 21:34:52 home6 acpid: client 2908[0:100] has disconnected
Sep  5 21:34:52 home6 acpid: client 3075[0:100] has disconnected
Sep  5 21:34:52 home6 logger: ACPI action lid is not defined
Sep  5 21:37:00 home6 logger: ACPI action lid is not defined
Sep  5 21:53:19 home6 kernel: [drm] Num pipes: 1
Sep  5 21:53:19 home6 acpid: client connected from 4138[0:100]
Sep  5 21:53:19 home6 acpid: 1 client rule loaded

In that example machine worked for a few hours untouched:

Code:

Sep  7 02:31:59 home6 kernel: [drm] Num pipes: 1
Sep  7 02:32:00 home6 acpid: client connected from 4157[0:100]
Sep  7 02:32:00 home6 acpid: 1 client rule loaded
Sep  7 02:32:47 home6 acpid: client 3657[0:100] has disconnected
Sep  7 02:32:47 home6 logger: ACPI action lid is not defined
Sep  7 02:55:54 home6 -- MARK --
Sep  7 03:15:54 home6 -- MARK --
Sep  7 03:35:54 home6 -- MARK --
Sep  7 03:55:54 home6 -- MARK --
Sep  7 04:15:54 home6 -- MARK --
Sep  7 04:35:54 home6 -- MARK --
Sep  7 04:55:54 home6 -- MARK --
Sep  7 05:15:54 home6 -- MARK --
Sep  7 05:35:54 home6 -- MARK --
Sep  7 05:55:54 home6 -- MARK --
Sep  7 06:15:55 home6 -- MARK --
Sep  7 06:35:55 home6 -- MARK --
Sep  7 06:55:55 home6 -- MARK --

To determine more precisely the time of that hang up I inspected /var/log/syslog but I found nothing interesting:

Code:

Sep  7 07:02:13 home6 kernel: INPUT packet died: IN=ppp0 OUT= MAC= SRC=89.227.11.156 DST=98.8.4.136 LEN=48 TOS=0x00 PREC=0x00 TTL=118 ID=31402 DF PROTO=TCP SPT=51570 DPT=19536 WINDOW=8192 RES=0x00 SYN URGP=0
Sep  7 07:02:34 home6 kernel: INPUT packet died: IN=ppp0 OUT= MAC= SRC=86.52.126.197 DST=98.8.4.136 LEN=131 TOS=0x00 PREC=0x00 TTL=112 ID=19300 PROTO=UDP SPT=55026 DPT=31373 LEN=111
Sep  7 07:02:53 home6 kernel: INPUT packet died: IN=ppp0 OUT= MAC= SRC=71.75.82.53 DST=98.8.4.136 LEN=131 TOS=0x00 PREC=0x00 TTL=111 ID=7139 PROTO=UDP SPT=41343 DPT=31373 LEN=111
Sep  7 07:03:13 home6 kernel: INPUT packet died: IN=ppp0 OUT= MAC= SRC=87.194.101.171 DST=98.8.4.136 LEN=131 TOS=0x00 PREC=0x00 TTL=118 ID=15775 PROTO=UDP SPT=48311 DPT=31373 LEN=111

I ran the same applications when I used my machine with Slackware 12.2 and everything worked well. The main difference between Slackware 12.2 and Slackware 13.0 installation in the case of that machine is the lack of the proprietary fglrx module in my new system: with Slackware 12.2 I used fglrx and with Slackware 13.0 I use radeon.

Is there any method to determine the reason of such hang ups maybe by running some tracking program in the background?

adamk75 09-08-2009 07:04 AM

You could try disabling direct rendering in the X server to see if that stops the lock ups. If your machine keeps locking up, it's very unlikely that it's related to the video driver. You can disable this by adding these lines to the "Module" section of your xorg.conf file:

Disable "dri"
Disable "dri2"

If you don't have a Module section, you can create one.

Adam

w1k0 09-08-2009 11:21 AM

Tonight my machine hung up untouched for the next time.

Now I disabled those two modules in /etc/xorg.conf according to your suggestions:

Code:

Section "Module"
        Load  "glx"
        Load  "extmod"
        Disable  "dri2"
        Load  "dbe"
        Disable  "dri"
EndSection

I inspected /var/log/Xorg.0.log and it seems dri* modules aren't loaded:

Code:

.
.
.
(WW) "dri2" will not be loaded unless you've specified it to be loaded elsewhere.
(WW) "dri" will not be loaded unless you've specified it to be loaded elsewhere.
(II) "extmod" will be loaded. This was enabled by default and also specified in the config file.
(II) "dbe" will be loaded. This was enabled by default and also specified in the config file.
(II) "glx" will be loaded. This was enabled by default and also specified in the config file.
(II) "dri" will be loaded even though the default is to disable it.
(II) "dri2" will be loaded even though the default is to disable it.
(II) LoadModule: "glx"
.
.
.
(EE) RADEON(0): [dri] RADEONDRIGetVersion failed (libdri.a too old)
[dri] Disabling DRI.
.
.
.
(II) AIGLX: Screen 0 is not DRI2 capable
(II) AIGLX: Screen 0 is not DRI capable
.
.
.

I run my machine using the same set of the applications as before. Now I can merely wait until my machine will hang up (or not).

kd5zex 09-08-2009 02:38 PM

I am having the same issue on a Toshiba A105 with 13.0 where 12.2 worked fine, seemingly random lock ups. No mouse or keyboard, ssh or networking freeze as well. I have created a new account using default settings and it has been running for ~36 hours so far without a lockup.

w1k0 09-08-2009 07:40 PM

Quote:

Originally Posted by kd5zex (Post 3674654)
I have created a new account using default settings and it has been running for ~36 hours so far without a lockup.

What's difference between your old accounts and that new one? What you mean saying about ``default settings''? What commands you use to create that new account?

This is an excerpt from my script for configuration of users' accounts:

Code:

USER_1000=john
USER_1001=mary

USER="$USER_1000"
DIRECTORY="/home/${USER}/"
if [ ! -e ${DIRECTORY} ]
then
    echo "Adding ${USER} and configuring his/her account."
    useradd ${USER}
    mkdir ${DIRECTORY}
    chmod 711 ${DIRECTORY}
    chown ${USER}:users ${DIRECTORY}
    mkdir ${DIRECTORY}bin/
    chown ${USER}:users ${DIRECTORY}bin/
    chsh -s /bin/bash ${USER}
    passwd ${USER}
    touch /var/spool/mail/${USER}
    chmod 660 /var/spool/mail/${USER}
    chown ${USER}:mail /var/spool/mail/${USER}
fi

USER="$USER_1001"
DIRECTORY="/home/${USER}/"
if [ ! -e ${DIRECTORY} ]
then
    adduser ${USER}
    mkdir ${DIRECTORY}bin/
    chown ${USER}:users ${DIRECTORY}bin/
    touch /var/spool/mail/${USER}
    chmod 660 /var/spool/mail/${USER}
    chown ${USER}:mail /var/spool/mail/${USER}
    rm ${DIRECTORY}.bash_profile
fi

As you see I use useradd command for the first user and adduser for the second. In the past I used useradd command for both users but it caused some strange problems. It was the first issue I posted here: Midnight Commander hangs up console or xterm.

kd5zex 09-09-2009 01:08 AM

Quote:

Originally Posted by w1k0 (Post 3675027)
What's difference between your old accounts and that new one? What you mean saying about ``default settings''? What commands you use to create that new account?

I created the account using adduser and didn't mess with any of the KDE settings (Window Behavior, etc.). So far so good. I used useradd earlier today and my wife said it wouldn't startx properly. She refused to give me any details about the error it was giving so that's all the info you get :doh:.

w1k0 09-09-2009 07:10 PM

It’s 32 hours after I disabled DRI in my xorg.conf and 24 hours since last reboot. My machine works flawlessly though I didn’t break yet the 36-hour record achieved by kd5zex on his Toshiba A105. Anyway I’m signing the thread as solved. If I’ll encounter the problem mentioned above anew I’ll reopen that thread once again. Thank you adamk75 for your valuable hint and thank you kd5zex for your supporting assistance.

adamk75 09-09-2009 08:03 PM

It might be worth reporting this bug upstream on the freedesktop bugzilla if you'd like to get 3D acceleration going again :-) The first thing they'd probably suggest is trying a newer kernel to see if the lockup is still happening.

w1k0 09-09-2009 10:20 PM

I have two ThinkPads: T60 with ATI Mobility Radeon X1300 and T40 with ATI Mobility Radeon 7500. On both machines I installed Slackware 13.0. Until now I used T60 exclusively.

Before report a bug I have to test it throughout:

1. Run T60 with DRI disabled for a few days to ensure the problem disappeared.
2. Run T60 with DRI enabled till the first hang up to ensure the problem persists.
3. Do the test with DRI enabled with the newest kernel version to ensure the problem persists.
4. If the problem will persist with the newest kernel version do the test with DRI disabled to ensure the problem disappeared.
5. Alternatively do all the above tests with my second machine.

To perform these tests for just one machine I need about one week. I'll test it throughout but don't expect that I report it here fast.

w1k0 09-12-2009 09:07 PM

I performed a lot of tests and I narrowed the problem once more.

According to my schedule from the above post first I ran my machine using kernel 2.6.29.6 with DRI disabled for a few days to ensure disabling DRI removes the problem and next I ran my machine with DRI enabled till the first hang up to ensure the problem is caused by DRI. Everything worked according to these assumptions. Then I did the test with DRI enabled using generic smp kernel version 2.6.31 to ensure the problem persists. With the kernel 2.6.29.6 I had to wait a dozen or so hours in order to hang up the machine. With the kernel 2.6.31 it was enough to wait a dozen or so minutes. So I started further tests and finally I found the method to hang up my machine in a dozen or so seconds.

I use Window Maker and twelve dockable applications: wmCalClock, wmpower, wmnet, wmbiff, wmsm, wmtop, wmdrawer, wmSun, wmMoonClock, wmweather, wmmixer and wminfo.

To run them I use such commands:

Code:

wmCalClock -24 &
wmpower -no-meddling -no-full-battery -no-cpufreq -no-noflushd &
wmnet --device=wlan0 &
wmnet --device=ppp0 &
wmnet &
mbiff &
wmsm -m -d sda &
wmtop -s 500 -r 50 -x wmtop &
wmdrawer -c /home/first_user/.wmdrawer/wmdrawerrc &
wmSun -lat 51.07 -lon -17.02 &
wmMoonClock -lat 51.07 -lon -17.02 &
wmweather -s EPWR -m &
wmmixer -w &
wminfo -p patches.wmi &
wminfo -p kernel-2.6.wmi &
wminfo -p gazeta-waluty.wmi &

All these programs are accessible in SlackBuilds.org – some of them require special configuration files or plug-ins.

During my tests I stated that to hang up my machine is enough to:

1. Run Window Maker and some of these dockable applications from first user’s account.
2. Run Window Maker, xterm and Midnight Commander from second user’s account.
3. In Midnight Commander highlight the directory name and then press and keep Enter.

In result Midnight Commander flashes changing repeatedly the directory down and up. After a dozen or so seconds the system hangs up. Disabling DRI removes the problem.

Some dockable applications cause that problem and some doesn’t cause it. Among vulnerable applications are:

● wminfo,
● wmCalClock with wmpower,
● wmsm with wmdrawer,
● wmnet with wmweather and wmmixer,
● some other combinations of dockable applications.

The easiest way of hang up the system is to use wminfo. It works with kernels 2.6.29.6 and 2.6.31. To test it you have some plug-in, for example:

test.wmi
Code:

ps -a | awk '{print $1,$4}' | grep -vE "ps|awk|grep|tac" | tac
To run it you have to put test.wmi plug-in in the path and use the command:

Code:

wminfo -p test.wmi
I performed all the above tests on ThinkPad T60 with ATI Mobility Radeon X1300. Then I performed short test on ThinkPad T40 with ATI Mobility Radeon 7500. That older machine works flawlessly. So perhaps the problem concerns only one version of ATI Mobility Radeon or maybe just my card is invalid.

Finally I tried to connect to https://bugs.freedesktop.org/ but Firefox displayed the warning:

Quote:

You have asked Firefox to connect securely to bugs.freedesktop.org, but we can't confirm that your connection is secure.

Normally, when you try to connect securely, sites will present trusted identification to prove that you are going to the right place. However, this site's identity can't be verified.

If you usually connect to this site without problems, this error could mean that someone is trying to impersonate the site, and you shouldn't continue.

bugs.freedesktop.org uses an invalid security certificate.

The certificate is not trusted because the issuer certificate is unknown.

(Error code: sec_error_unknown_issuer)

If you understand what's going on, you can tell Firefox to start trusting this site's identification. Even if you trust the site, this error could mean that someone is tampering with your connection.

Don't add an exception unless you know there's a good reason why this site doesn't use trusted identification.
I never reported bugs on the freedesktop bugzilla. Is that problem sufficiently tested to report a bug or should I perform some further tests? Is it secure to connect to bugs.freedesktop.org now or it’s better to wait for a new security certificate?

Maybe someone here has a machine with ATI Radeon and will be so kind to spend a quarter running Window Maker with wminfo from first account and Window Maker with Midnight Commander from the second one testing it with DRI enabled and disabled?

Every assistance will be welcomed.

w1k0 09-13-2009 01:09 PM

Disabling DRI and DRI2 solves the problem with the random hang ups but it causes other problems (see: here).

w1k0 09-16-2009 01:15 AM

I partially solved the problem with random system hang ups and described it here. I achieved it by replacing default xf86-video-ati driver in version 6.12.2 by version 6.12.4. The system still hangs up but about 8.7 times less frequently than with default driver.

kd5zex 09-17-2009 10:59 AM

Ok the Toshiba has started to freeze again every few hours. I have disabled dri and dri2 and will report back later.

w1k0 09-17-2009 12:43 PM

Quote:

Originally Posted by kd5zex (Post 3687068)
Ok the Toshiba has started to freeze again every few hours. I have disabled dri and dri2 and will report back later.

Try to replace your current driver with the newest one as I described it here. In the consecutive post you'll find SlackBuild script to build xf86-video-ati driver version 6.12.4. I replaced the driver 36 hours ago. Since that time my machine hung up just once. I still observe it.

onebuck 10-04-2009 01:31 PM

Hi,

Quote:

Originally Posted by w1k0 (Post 3676512)
Itís 32 hours after I disabled DRI in my xorg.conf and 24 hours since last reboot. My machine works flawlessly though I didnít break yet the 36-hour record achieved by kd5zex on his Toshiba A105. Anyway Iím signing the thread as solved. If Iíll encounter the problem mentioned above anew Iíll reopen that thread once again. Thank you adamk75 for your valuable hint and thank you kd5zex for your supporting assistance.

In your previous post of the '/var/log/Xorg.0.log' you don't seem to have 'dri' dis-enabled.
Quote:

(WW) "dri2" will not be loaded unless you've specified it to be loaded elsewhere.
(WW) "dri" will not be loaded unless you've specified it to be loaded elsewhere.
(II) "extmod" will be loaded. This was enabled by default and also specified in the config file.
(II) "dbe" will be loaded. This was enabled by default and also specified in the config file.
(II) "glx" will be loaded. This was enabled by default and also specified in the config file.
(II) "dri" will be loaded even though the default is to disable it.
(II) "dri2" will be loaded even though the default is to disable it.

(II) LoadModule: "glx"

The bold part indicates that 'dri' is to be loaded. I noticed in my '/var/log/Xorg.0.log' the same after I disabled. The 'dri' was disabled by;
Code:

excerpt from my '/var/log/Xorg.0.log';
(II) Loading sub module "int10"
(II) LoadModule: "int10"
(II) Loading /usr/lib64/xorg/modules//libint10.so
(II) Module int10: vendor="X.Org Foundation"
        compiled for 1.6.3, module version = 1.0.0
        ABI class: X.Org Video Driver, version 5.0
(II) RADEON(0): initializing int10
(II) RADEON(0): Primary V_BIOS segment is: 0xc000
(II) RADEON(0): Legacy BIOS detected
(EE) RADEON(0): [dri] RADEONDRIGetVersion failed (libdri too old)
[dri] Disabling DRI.

I'm not sure how to proceed to replace the 'libdri'. The use of 'xf86-video-ati-6.12.4.tar.bz2' has gotten me to this point. I'm getting used too the new Xorg. 'X' is not one of my strengths but now I'll have to finally make the jump an learn more.

I was not able to even get this far until I did;

Code:

Section "Module"
        Load  "extmod"

        SubSection "extmod"
                Option      "omit xfree86-dga"  # don't initialise the DGA extension
        EndSubSection

        Disable "dri2"
        Disable "dri"
#      Load  "dri2"
#      Load  "dri"
        Load  "glx"
        Load  "dbe"
        Load "synaptics"
EndSection

I added the Disables for 'dri' and 'dri2'. The hardware seemed to init and recognize differently when the disables were introduced. The '200M' was initialized but that damn 'libdri' got me. The hardware is a 'Dell Inspiron 1501' with a 'ATI Radeon XPRESS 200M 5975 (PCIE)' which has been giving me fits.

Reading more than I really want about other peoples issues. 'ATI' just drives me nuts over this one. :)


All times are GMT -5. The time now is 09:01 PM.