LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - General (https://www.linuxquestions.org/questions/linux-general-1/)
-   -   Ubuntu server crashes, unknown errors on console only (https://www.linuxquestions.org/questions/linux-general-1/ubuntu-server-crashes-unknown-errors-on-console-only-790641/)

swordphsh 02-21-2010 06:50 PM

Ubuntu server crashes, unknown errors on console only
 
2 Attachment(s)
Hi, I have a computer running Ubuntu Server 8.04 32bit. It has a few hard drives configured with a partition for the root OS and the rest of the space is configured in a LVM and formatted as XFS.

About every two weeks or so, the box goes "dead" and I receive these errors on my serial console screen. It is completely unresponsive except for a hard reboot/power.

I have attached two screen shots of the serial console from today, but the errors are usually similar to them. I tried running in recovery mode and fsck'ing both partitions with the xfs check/repair utility, but the problem persists. I'm not sure if it makes a difference, but this usually happens on days that I torrent about 20GB up and down...legally, of course.

Any help would be appreciated. Thanks in advance.

irmin 02-22-2010 04:54 AM

Can you post a few lines from above the screenshot?

It looks as there is a bug in the XFS kernel module. Another possibility for the error is malicious RAM or so. Try to run memtest86+ to check, that your RAM is ok.

swordphsh 02-22-2010 12:00 PM

Quote:

Can you post a few lines from above the screenshot?
I'm not sure what you're asking...Do you want me to type the error messages into a post? The screenshots are the only record I have of the errors, I can't find them in any logs.

Quote:

It looks as there is a bug in the XFS kernel module. Another possibility for the error is malicious RAM or so. Try to run memtest86+ to check, that your RAM is ok.
Malicious, as in bad RAM? I'll give that a try later tonight.

Thanks for the quick response.

irmin 02-22-2010 12:24 PM

You screenshots begin with "Call Trace:". Normally there are some lines above, that belong to the error dump too. They identify the source of the error, like NULL pointer dereference or illegal instruction, inability to handle a paging request, ...

If the log on the serial console can be saved, then it'll be useful, if you can post the lines.

http://upload.wikimedia.org/wikipedi...l_panic-v2.jpg and
http://www.roberthancock.com/kerneloops.png provides an example of a kernel panic dump

swordphsh 02-22-2010 12:48 PM

Quote:

If the log on the serial console can be saved, then it'll be useful, if you can post the lines.
I configured logging on the serial console. I'll post the results when it happens again.

Thanks again.

swordphsh 02-25-2010 07:40 PM

2 Attachment(s)
I'm not sure if this is the same error, but the error messages look similar. The logging didn't work, but I did manage to grab some screenshots. It appears to be constantly spamming this "set" of errors every...11 seconds...I guess. Also, I was able to use SysRq over serial to shut everything down and reboot the machine, normally it is completely unresponsive and does not repeat any errors like it did this time.

I haven't had a chance to run Memtest yet because it's headless at a remote location and I can't figure out how to run it over serial.

Thanks again.

swordphsh 02-27-2010 02:58 PM

*Bump* Can anyone help me out here please?

syg00 02-27-2010 05:02 PM

Might be memory allocation. Knock up a script to save /proc/meminfo and /proc/slabinfo every hour or two - with a date/time header . Make sure you get it before and after any of those torrent runs. Should allow you to simply see any oddball jumps.


All times are GMT -5. The time now is 05:08 AM.