LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Debian (https://www.linuxquestions.org/questions/debian-26/)
-   -   Extreme Stability Issues With Sid (https://www.linuxquestions.org/questions/debian-26/extreme-stability-issues-with-sid-406523/)

nceterval 01-22-2006 09:38 AM

Sid stability issue resolved! Quick Amarok database, X display permission questions.
 
I recently installed Debian Sid over an Ubuntu partition because I had become frustrated with stability problems in Ubuntu. Not coincidentally, I first installed Ubuntu some months ago to replace a Debian Sid install which was also having stability problems. Unfortunately, the new Debian install is having similar problems. Perhaps this is a hardware problem? I've run memtest86+ overnight and found no errors, not sure what other hardware tests to do. I will describe my symptoms below:

I'm using all the latest packages from the Sid repositories.

Various programs (Nicotine, Gaim, Kopete, vorbisgain, slimserver when scanning for music) routinely crash after about 10s or so. When running these programs from the command line I will usually see either a 'segfault' error or none at all.

Other programs (konqueror, amarok, superkaramba) live longer, perhaps an half a hour or so, but also frequently crash while I'm using them.

Some programs have no observable or unexpected stability issues (K3B, konsole, opera).

During times of high load (ripping a CD, launching apps) random apps seem prone to crashing, particularly kicker and kwin, and occasionaly the xserver will restart. High load seems to exacerbate the aforementioned stability problems, as well.

Most distressing is that xorg occasionally crashes hard. When this happens I will be able to move the cursor, but no keyboard or mouse input will be responded to (I cannot ctrl+alt+backspace to restart x) and the screen will freeze, so I have to hard reboot. I'm using the 8178 nvidia drivers and the xorg server with a Geforce 5900.

I understand that Sid is unstable, but I don't believe this behaviour is normal. Thanks for any help!

HappyTux 01-22-2006 12:15 PM

No that is not normal behavior, segfault errors are usually memory related but I see you have run the memtest and it passed. Do you have another machine you can take and put the hard drive in to run for a while? Things I would try would be testing the hard drive with your manufacturers diagnostic software if overclocking the CPU put it back to default settings. If using fastwrites/side banb addresing for the nVidia card turn it off if using the renderaccel in your X config disable it. Try going to the console stopping the X server and compiling a kernel 5 or 10 times in a row to eliminate cpu/memory problems. Check the temperatures of the CPU and Video card on an ongoing basis. Have a console window with top running at all times you can see to check if a particular process seems to be using a lot of cpu power when it hangs/segfaults. What is the hardware you have in the machine besides the 5900?

nceterval 01-22-2006 01:08 PM

Linux 2.6.15-1-K7
Hardware:
Athlon XPM 2400+ OCed to 2.3GHZ
1GB of WINTEC DDR RAM (refurbished, IIRC)
A-Bit NF7 mobo
250GB Maxtor HDD with two Reiser3 partitions

I was concerned that the overclocked CPU might be a problem, but I dismissed it once it passed the memtest. Also, this box runs LiveCDs without any apparent stability issues (which, I suppose, points to either the HDD or software as the problem). Should I still try turning that off for a while?

I had the 5900 OCed to 5950 speeds, but I just disabled that. We'll see if that helps the X problems, but I'm doubtful because I had it OCed under the old Ubuntu setup, as well, and I wasn't having problems with X crashing then.

I took a look for 'renderaccel' in xorg.conf and it wasn't there, so I assume it's not enabled. I'm not familiar with fastwrites/side banb addresing, is that something that would be set in the BIOS?

Running hard drive diagnostic tools sounds interesting, I hadn't considered the HDD as a point of failure. How do I do that, exactly? Is there a Debian package? You mention the manufacturer's tools, how are those used?

I do have another box I could transplant the HDD into, but I can't afford to spare it at the moment. I might try that later this week.

HappyTux 01-22-2006 03:20 PM

Quote:

Originally Posted by nceterval
Linux 2.6.15-1-K7
Hardware:
Athlon XPM 2400+ OCed to 2.3GHZ
1GB of WINTEC DDR RAM (refurbished, IIRC)
A-Bit NF7 mobo
250GB Maxtor HDD with two Reiser3 partitions

You may want to try a 2.6.14 or less kernel I use the Con Kolivas kernel patches and he made a post yesterday on his mailing list updating the older 2.6.14 patches where he was saying the 2.6.15 kernel was not the best for stability so give that a try first.
Quote:

I was concerned that the overclocked CPU might be a problem, but I dismissed it once it passed the memtest. Also, this box runs LiveCDs without any apparent stability issues (which, I suppose, points to either the HDD or software as the problem). Should I still try turning that off for a while?
That certainly points in the direction of the software on the system I take it that this is for hours at a time running them.

Quote:

I had the 5900 OCed to 5950 speeds, but I just disabled that. We'll see if that helps the X problems, but I'm doubtful because I had it OCed under the old Ubuntu setup, as well, and I wasn't having problems with X crashing then.
It can cause problems I have a 5900xt which I tried to get up to a 5900 speeds and it locked up X solid every time I tried it.

Quote:

I took a look for 'renderaccel' in xorg.conf and it wasn't there, so I assume it's not enabled. I'm not familiar with fastwrites/side banb addresing, is that something that would be set in the BIOS?
Those options need to be enabled by passing options to the nVidia driver in the case of the fastwrites/side band addressing (had a typo there) although I have the option in my BIOS to enable the fastwrites which had to be turned on for me to use it. In the case of the renderaccel it is an option in the X config file that can be used. If you are not familiar with any of these then you most likely have not turned them on except maybe the fastwrite in the BIOS may be set which can still cause problems.
Quote:

Running hard drive diagnostic tools sounds interesting, I hadn't considered the HDD as a point of failure. How do I do that, exactly? Is there a Debian package? You mention the manufacturer's tools, how are those used?

I do have another box I could transplant the HDD into, but I can't afford to spare it at the moment. I might try that later this week.
You can go to Maxtor's website and download the Maxblast diagnostic tool then you would need to be doing this on a Windows machine as you have to run the .exe to create a bootable floppy. You then boot with the floppy and run the diagnostics on the HD of course you need to have the floppy as a higher boot device than the HD. All in all I would say try the lower kernel first then move on to the HD just due to the problems when scanning the files just do not change too many things at once go one step at a time otherwise you will never know which part was causing you the problems.

nceterval 01-22-2006 03:30 PM

Alright, I will try a lower kernel. I see that you're using the Kolivas kernel, is that the one with userspace preempting (or something fancy-sounding like that) to make GUIs feel more responsive? How do you like it? Would you recommend it? Did you compile it yourself, or are there Debian packages for it somewhere?

Thanks for all the help!

haertig 01-22-2006 04:18 PM

How hot does your system run? The combo of high CPU load + overclock + heat may be your problem. Two out of three may not trigger the problem. e.g., running from a LiveCD may not ever generate the "high CPU load" part of the equation because the CPU is always waiting on CDROM IO and never gets around to heating itself up.

Check your temps, stop overclocking, and see if that helps. I would recommend testing this hypothesis, not theorizing/rationalizing it out of existence without some physical test. Heat is well known to cause instability. CPU load causes heat. Overclocking causes heat. Overclocking can cause stability issues of it's own.

When you're troubleshooting and hear hoofbeats, think horses, not zebras. Make sure you've really ruled out the common causes before advancing on to more exotic possibilities. Kernels, video drivers, etc. could certainly be playing a part here. But I personally doubt it. My bet goes for mundane problems with heat/overclocking. Especially since this problem has been plagueing you across different distros, most likely these were using different kernels and video drivers already.

HappyTux 01-22-2006 04:54 PM

Quote:

Originally Posted by nceterval
Alright, I will try a lower kernel. I see that you're using the Kolivas kernel, is that the one with userspace preempting (or something fancy-sounding like that) to make GUIs feel more responsive? How do you like it? Would you recommend it? Did you compile it yourself, or are there Debian packages for it somewhere?

Thanks for all the help!

The patches work well I really have nothing to compare it to as far as recent kernels as I have been using the patches for so long but when I first started you could really see/feel the difference so I have stuck with it since. You have to patch the kernel sources from the http://kernel.org and compile yourself there are no Debian packages. Would I recommend it yeah why not it works good for me and the worst that can happen is you do not like the kernel you compile and go back to your already working kernel. Here is a guide to compiling a kernel the Debian way and if you want I could post my .config where you could download it to start from, it is for a 2.6.12.5 kernel as I am sort of stuck there if I want to keep using my remote control as the lirc modules will not compile on 2.6.13 or greater but it could be a good starting point as it already has reiser compiled in so no initrd needed and probably only minor modifications needed for netcard, sound card things like that ...

farslayer 01-22-2006 05:55 PM

I would agree with haertig here, Put that CPU back down to it's designated clock rate and monito the system. Until you do the basics you are most likely wasting your time chasing ghosts. 9 times out of 10 it's the simple solutions that we skip or overlook that are causing the problem.

My Sid machine is rock steady, it has never crashed, the only time it ever went down hard was due to a power outage..

nceterval 01-22-2006 06:23 PM

Alright, I throttled the clock settings back to the factory defaults and, after about 10 minutes, everything seems to be stable! I'll wait a day or two before I'm satisfied, but so far I'm very pleased!

It's beyond me why the stability issues that were showing up at the OS level (presumably) due to overheating weren't a problem during the memtest, which I believe does some pretty CPU-stressing tests, but I suppose it doesn't matter.

Thanks to all those who helped!



While I'm here, there are a couple other problems I've been having:

  1. Root console cannot launch X apps, for instance Synaptic returns the following error:
    Code:

    (synaptic:9778): Gtk-WARNING **: Locale not supported by C library.
            Using the fallback 'C' locale.
    Xlib: connection to ":0.0" refused by server
    Xlib: Invalid MIT-MAGIC-COOKIE-1 key

    (synaptic:9778): Gtk-WARNING **: cannot open display:

    After googling, I found that running "xauth merge ~/.Xauthority" at the beginning of every X session fixes this. Is there a more permanent solution? Perhaps simply placing the above command in ~/.kde/Autostart?
  2. amaroK does not seem capable of automatically updating it's database, I can only do a full manual rebuild. I assume there's some sort of filesystem monitoring package I'm missing.

haertig 01-22-2006 06:33 PM

Quote:

Originally Posted by nceterval
Alright, I throttled the clock settings back to the factory defaults and, after about 10 minutes, everything seems to be stable! I'll wait a day or two before I'm satisfied, but so far I'm very pleased!

If ten minutes of system uptime has got you this happy, you've got nothing but good things ahead of you! I hope it remains stable for you. If not, report back here and maybe better help will be forthcoming from others.

haertig 01-22-2006 06:43 PM

Quote:

While I'm here, there are a couple other problems I've been having...
It would be much better for you to start a new thread for each problem. Synaptic and AramoK problems don't really fit into a "Stability" thread, thus you won't get the readership you need to find knowledgeable people in these areas who can help you.

It's OK to post multiple threads. One topic per thread. You'll get much better help. Use appropriate subject lines so you attract the right people. For example, a subject line of exactly what you said before: "Root console cannot launch X apps" will do you MUCH better than "New to Debian - need some help!"

[edit]
Oh, I now see that you changed the subject line of your original thread. I must have missed that before. However, I still recommend starting new threads for new problems rather than trying to redirect an old thread with a new subject line. I'd say, don't edit your subject line unless it's so glaringly obvious that it was terrible to start with. Your original subject line for this thread - I can't remember exactly what it was now - was quite good, if I remember correctly.
[/edit]

HappyTux 01-22-2006 07:05 PM

Quote:

Originally Posted by nceterval
Alright, I throttled the clock settings back to the factory defaults and, after about 10 minutes, everything seems to be stable! I'll wait a day or two before I'm satisfied, but so far I'm very pleased!

It's beyond me why the stability issues that were showing up at the OS level (presumably) due to overheating weren't a problem during the memtest, which I believe does some pretty CPU-stressing tests, but I suppose it doesn't matter.

Thanks to all those who helped!

Memtest really stress the memory controller/memory chips themselves the CPU is only there to have the program running unless of course you have a built on CPU memory controller like I have with an amd64 chip then the cpu is a factor.


Quote:

While I'm here, there are a couple other problems I've been having:

  1. Root console cannot launch X apps, for instance Synaptic returns the following error:
    Code:

    (synaptic:9778): Gtk-WARNING **: Locale not supported by C library.
            Using the fallback 'C' locale.
    Xlib: connection to ":0.0" refused by server
    Xlib: Invalid MIT-MAGIC-COOKIE-1 key

    (synaptic:9778): Gtk-WARNING **: cannot open display:

    After googling, I found that running "xauth merge ~/.Xauthority" at the beginning of every X session fixes this. Is there a more permanent solution? Perhaps simply placing the above command in ~/.kde/Autostart?
  2. amaroK does not seem capable of automatically updating it's database, I can only do a full manual rebuild. I assume there's some sort of filesystem monitoring package I'm missing.

For problem one put a couple of lines like this in your /root/.bashrc.

Code:

## allows me to run an X program as root

export XAUTHORITY=/home/stephen/.Xauthority

Of course changing the /home/stephen/ part to your users home. For the second I have no clue I never use amarok I find it to be a complete CPU hog compared to xmms which I rather like plus when adding my collection it takes forever I had to break it up in pieces whereas xmms took a couple of minutes max this with 10,000+ files.


All times are GMT -5. The time now is 01:49 AM.