LinuxQuestions.org
Latest LQ Deal: Latest LQ Deals
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Hardware
User Name
Password
Linux - Hardware This forum is for Hardware issues.
Having trouble installing a piece of hardware? Want to know if that peripheral is compatible with Linux?

Notices


Reply
  Search this Thread
Old 11-22-2004, 02:38 PM   #1
deviance99
Member
 
Registered: Jun 2004
Location: Mount Pleasant, MI
Posts: 41

Rep: Reputation: 16
Random Server Freezes


Hi all, I am running a dual opeteron 64-bit machine, with 4gb of RAM, and 6 Ultra SCSI IBM 140GB Hard Drives on two seperate UltraSCSI cards. The OS is fedora core 3 x86-64, setup with 100mb boot, 4gb swap, and the rest of one of the disks for / . It seems to boot fine, then shortly after it freezes and I have to reboot the computer s (sometimes before, sometimes after logging in).

I was wondering if anyone had any ideas that could help me out on the problem.
 
Old 11-22-2004, 02:51 PM   #2
deviance99
Member
 
Registered: Jun 2004
Location: Mount Pleasant, MI
Posts: 41

Original Poster
Rep: Reputation: 16
It gives me a machine check exception then Kernel Panic:

like: CPU 0: Machine Check Exception: 7 Bank 4: b41fa000000000000a13

RIP 10:<ffffffffff8010e5e9> {default_idel+0x20/0x23)
TSC 1af6e36cb8b ADDR 472988
Kernel Panic - not syncing: Uncorrected machine check


---- I think it's a memory error ----
anyone else?
 
Old 11-23-2004, 11:41 PM   #3
RJARRRPCGP
Member
 
Registered: Feb 2004
Location: USA (Springfield, Windsor County, Vermont)
Posts: 57

Rep: Reputation: 15
A CPU chip error can cause a machine check exception, too. The CPU chip may be overheating.
 
Old 11-24-2004, 12:53 AM   #4
mritch
Member
 
Registered: Nov 2003
Location: austria
Distribution: debian
Posts: 667

Rep: Reputation: 30
"...ion: 7 Bank 4:" ..check what a "mce: 7" mean.
change the ram modules..

...and get sure there really is a faulty part (the problem mce reports). i've also heard about mce itself causing false alarms on some machines, but can't remember to hear about any 64bit procs.

sl mritch
.
 
Old 11-24-2004, 12:25 PM   #5
deviance99
Member
 
Registered: Jun 2004
Location: Mount Pleasant, MI
Posts: 41

Original Poster
Rep: Reputation: 16
Thanks, I booted into Knoppix and am testing the memory right now... everything so far has been good. (using memtest) I haven't had a freeze either... which = good news. Are there are any knoppix programs that will test my CPU's out? Or let me know that both are working properly?

I want to make sure it's the OS and not the hardware.
 
Old 11-24-2004, 01:57 PM   #6
mritch
Member
 
Registered: Nov 2003
Location: austria
Distribution: debian
Posts: 667

Rep: Reputation: 30
if memtest (can it handle 64bit well?) doesn't report errors check that there is no hardware problem like to much heat causing this. if everything seem to work here, but the message stays - try to exclude acpi related problems. if it keep failing and you think hw is alright - disable mce kernel support. maybe this can be done by passing "nomce" to your kernel at boottime.

however, faulty mem modules can show their errors quite randomly - so i suggest running a few find in / and compile a few kernels (put io&mem load on the machine) to be sure. afaik memtest86 can be booted directly and can check all your mem. use it, if it works on your machine.

re-plug your ram modules for better mechanical contact.

sl mritch.
 
Old 11-24-2004, 08:21 PM   #7
deviance99
Member
 
Registered: Jun 2004
Location: Mount Pleasant, MI
Posts: 41

Original Poster
Rep: Reputation: 16
Thanks for the direction.

I ran cpuburn for about 20 minutes without a problem so I think it might be something in the kernel, I am going to try to recompile it when I get back to work Monday.
 
Old 11-29-2004, 02:42 PM   #8
deviance99
Member
 
Registered: Jun 2004
Location: Mount Pleasant, MI
Posts: 41

Original Poster
Rep: Reputation: 16
I have had no luck in being able to mount the SCSI drives in knoppix, strangely, since they are both fairly new Ultra SCSI's from adaptec... although, I have been able to boot into both the Fedora rescue thing (where I was missing the nessacary libraries to compile the kernel), knoppix, and knoppix-x86-64 for long periods of time without a freeze.

I was using the hard drives in the fedora rescue too; so I don't think they are bad, and it seems like the CPu and memory are fine.

So, do I conclude it's just the kernel, and find a way to recompile it?
 
Old 11-29-2004, 03:39 PM   #9
mritch
Member
 
Registered: Nov 2003
Location: austria
Distribution: debian
Posts: 667

Rep: Reputation: 30
did i get your right?
you disabled mce error checking and ran a memtest where no errors showed up, but your machine keep freezing after random uptime.

btw. a 20minutes check will not give you reliable results. i once had faulty modules and a 1 1/2 day run showed them all (memtest86).

you should be able to get a environment for building a kernel with the development packets from your distribution.
than get the kernelsource from your distr. or download the pristine one from ftp.kernel.org . i don't suggest using 2.6 yet, but it may be necessary to get your hardware supported for sure.. get 2.6.9 which seems to work (for one of my servers).
you can also try just another prebuild kernel from your distribution.

you said your scsi controller is quite new, so google if there are problems with kernel support here. but i can hardly think of a relation between your hangs and a failing scsi-controller(driver).

to actually build a kernel change to the kernel source directory and "make menuconfig". select everything you need and build your kernel. there are instructions about that on the net.

you can also build it on another machine and later install it on the box too.

sl mritch.
 
Old 11-30-2004, 02:10 PM   #10
deviance99
Member
 
Registered: Jun 2004
Location: Mount Pleasant, MI
Posts: 41

Original Poster
Rep: Reputation: 16
Thanks for the reply,

I haven't disabled MCE error checking; I don't know how to. I was under the impression in was a kernel option. I haven't been able to recompile the kernel because I only have one x86-64 machine, which is the one that keeps freezing, and I thought, to get an accurate view of the problem, that I needed to recompile within the platform. Would I be able to test it accuratly with a 32-bit compilation of the kernel, or do I need a 64-bit one?

Unfortunatly, the machine will run maybe 5 minutes before it freezes. I then tried knoppix, but no luck with the scsi hard drives. I did a google search, which turned up nothing for me, so I thought I would ask this forum, since there are many bright minds that have helped me in the past.

I will also try to find a prebuilt kernel; something I hadn't thought of before; and if that doesn't work I can let memtest86 run over the weekend.

Thanks again for your knowledge.
 
Old 11-30-2004, 03:43 PM   #11
mritch
Member
 
Registered: Nov 2003
Location: austria
Distribution: debian
Posts: 667

Rep: Reputation: 30
i too come to the impression it has nothing to do with your hardware. so maybe just get a new kernel. i use debian myself, so i can only tell you where to go for a prebuild one from debian. normally you should be able to download one from fedora. have a look at their homepage and ftp-server (there likely will be one where you can get updates).

you can built a kernel on a 32bit maschine, but you can't test it there and it's quite complex since you need all the 64bit-libs and have to customize the kernel's makefile; likely some other things too..

so i think it's best to get a prebuild one and install it first next to your current one to see if it fixes the problem.

luck,
mritch.
 
Old 12-02-2004, 06:28 PM   #12
deviance99
Member
 
Registered: Jun 2004
Location: Mount Pleasant, MI
Posts: 41

Original Poster
Rep: Reputation: 16
I downloaded an updated kernel and installed it; the server has been running fine all day. Thanks for your help, everyone.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Random freezes, don't know why randomshinichi Linux - Software 3 06-07-2005 11:58 AM
RH9 random freezes ashwin_cse Red Hat 1 08-12-2004 08:55 AM
ALSA freezes system at random during use tsigo Linux - Hardware 11 04-25-2004 12:09 PM
Random freezes within RH9 (help with grub please) rhino02ss Red Hat 6 08-30-2003 02:17 AM
nvidia and random freezes claylong Linux - Newbie 4 02-25-2003 07:03 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Hardware

All times are GMT -5. The time now is 08:47 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration