LinuxQuestions.org
Latest LQ Deal: Latest LQ Deals
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Enterprise Linux Forums > Linux - Enterprise
User Name
Password
Linux - Enterprise This forum is for all items relating to using Linux in the Enterprise.

Notices


Reply
  Search this Thread
Old 02-04-2008, 04:34 PM   #1
ajatiti
Member
 
Registered: Jun 2007
Posts: 45

Rep: Reputation: 15
Machine check exception on RHEL4


I am facing a problem with Dell Power Edge 2950 server. It is RHEL4 (kernel 2.6.9-5).

The server gets hung with the following on the screen. I took the screen shot when I connected through DRAC console.
Dell recommended to update the firmware (BIOS and BMC). We did that and still having same problem.
DRAC logs all the hardware event logs, we can see " cpu mach chk" error in that logs. The front panel on the physical server displays the same error.

Also in September we had similar problem that occured twice and then we changed the motherboard, cpu, riser.

Now dell says they cannot seee any hardware problem they want us to loook for any OS issues.

Do you think it could be an OS issue? Did anyone had the same issue?

This is what I saw on the console:

stack: ffffffff8011ba9a 0000000000000000 0000000000000002 0000000000000000
0000000000000000 0000000000000900 00000000ffffffff ffffffff803beea0
00007730a18eb238 ffffffff8011bad7
Call Trace:<ffffffff8011ba9a>{smp_really_stop_cpu+0} <ffffffff8011bad7>{smp_send
_stop+52}
<ffffffff80135106>{panic+235} <ffffffff8011744f>{print_mce+159}
<ffffffff80117510>{mce_available+0} <ffffffff80117855>{do_machine_check+811}
<ffffffff8010e6cc>{mwait_idle+86} <ffffffff8010e6cc>{mwait_idle+86}
<ffffffff8011115b>{machine_check+127} <ffffffff8010e6cc>{mwait_idle+86}
<EOE> <ffffffff8010e65c>{cpu_idle+26}

Code: eb f6 85 db 7e 0a 8b 45 14 44 39 e0 74 02 eb f6 31 c0 85 db
console shuts up ...
NMI Watchdog detected LOCKUP on CPU1, registers:
CPU1
Modules linked in: e1000(U) md5 ipv6(U) autofs4 i2c_dev i2c_core sunrpc ds yen
_socket pcmcia_core button battery ac sr_mod(U) usb_storage joydev uhci_hcd eh
_hcd bnx2(U) dm_sbanpshot dm_zero dm_mirror ext3 jbd(U) dm_mod mptfc(U) mptsas(
mptspi(U) mptscsih(U) mptbase(U) megaraid_mbox(U) megaraid_mm(U) megaraid_sas
sd_mod scsi_mod
Pid:3864, comm: hald Tainted: GF M 2.6.9-5.ELsmp
RIP: 0010:[<ffffffff802f88c4>]

thanks..
 
Old 02-13-2008, 10:06 AM   #2
slacksite
LQ Newbie
 
Registered: Feb 2008
Posts: 12

Rep: Reputation: 0
What are the physical specs of the CPUs on this server?

Newer processors require later versions of RHEL4 to work properly.

In particular, there were some OS changes made post RHEL4U5 to address Clovertown and Harpertown CPUs. Odd things would happen, including MCEs, particularly on 64-bit. Based on the addresses in your stack trace, you are running 64-bit as well.

Its also probably worth mentioning that you are running an OS that is 2 years old (RHEL4 GA). I would *strongly* recommend you update to the latest RHEL4 update and errata.
 
Old 02-14-2008, 10:38 AM   #3
ajatiti
Member
 
Registered: Jun 2007
Posts: 45

Original Poster
Rep: Reputation: 15
The processor is Intel Xeon 5150 2.66Ghz 64-bit.

What is the latest update for RHEL4?
Where can I get the update from? Will there be any impact on the working server?
Appreciate your help!!
 
Old 02-15-2008, 10:26 AM   #4
slacksite
LQ Newbie
 
Registered: Feb 2008
Posts: 12

Rep: Reputation: 0
RHEL4U6 is the latest update release, but there are already errata to RHEL4U6.

Do you have a subscription to RHN? take a look at the following few KB articles:

http://kbase.redhat.com/faq/FAQ_80_4293.shtm

http://kbase.redhat.com/faq/FAQ_80_3929.shtm
 
Old 02-18-2008, 01:50 PM   #5
ajatiti
Member
 
Registered: Jun 2007
Posts: 45

Original Poster
Rep: Reputation: 15
Thanks a Ton..
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Machine Check Exception on new Opteron server humbletech99 Linux - Hardware 2 09-21-2006 10:57 AM
Machine check exception? ryanreich Linux - General 1 08-18-2006 08:16 PM
Kernel Panic, Machine Check exception tinksmartbstupi Linux - Software 5 11-16-2005 03:18 PM
Machine Check Exception 0000000000000004 pbs Linux - Software 7 06-26-2005 12:33 PM
CPU#0:Machine Check Exception karamboul Linux - Software 1 03-29-2002 10:33 PM

LinuxQuestions.org > Forums > Enterprise Linux Forums > Linux - Enterprise

All times are GMT -5. The time now is 11:52 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration