LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Server (https://www.linuxquestions.org/questions/linux-server-73/)
-   -   Urgent: RHEL4 64bit keeps crashing (https://www.linuxquestions.org/questions/linux-server-73/urgent-rhel4-64bit-keeps-crashing-557369/)

spayce 05-29-2007 03:01 AM

Urgent: RHEL4 64bit keeps crashing
 
I am a total linux Noob but my server keeps crashing, looks like hard drive error, but I dont have any SMART errors, and the array seems ok.

Looking for some help, this keeps on happening to my RHEL4 64bit Hugemem machine running vmware and it takes down the server.

I am running a Supermicro box with Dual Quad Core processors and 16GB of mem,

There are two arrays One Mirror set for the system and a RAID 10 for the VM volume.

The machine halted, no console, no ssh, only responds to ping, I see this in all the virtual terminals

ext3-fs error (device dm-0) in start_transaction:Journal has aborted

and after several hours of several days it happens again.


here is a section of /var/log/messages

it is interesting to note All the had and Ide errors and how at 12:32:12 last night the log just stops and then it picks up again at 22:39 when I rebooted it. Also after the reboot, the exact same errors start occurring again.


Looking for next step suggestions.

May 28 00:32:10 vmserver01 kernel: hda: packet command error: status=0x51 { DriveReady SeekComplete Error }

May 28 00:32:10 vmserver01 kernel: hda: packet command error: error=0x54

May 28 00:32:10 vmserver01 kernel: ide: failed opcode was 100

May 28 00:32:10 vmserver01 kernel: ATAPI device hda:

May 28 00:32:10 vmserver01 kernel: Error: Illegal request -- (Sense key=0x05)

May 28 00:32:10 vmserver01 kernel: Cannot read medium - incompatible format -- (asc=0x30, ascq=0x02)

May 28 00:32:10 vmserver01 kernel: The failed "Read Subchannel" packet command was:

May 28 00:32:10 vmserver01 kernel: "42 02 40 01 00 00 00 00 10 00 00 00 00 00 00 00 "

May 28 00:32:11 vmserver01 kernel: hda: packet command error: status=0x51 { DriveReady SeekComplete Error }

May 28 00:32:11 vmserver01 kernel: hda: packet command error: error=0x54

May 28 00:32:11 vmserver01 kernel: ide: failed opcode was 100

May 28 00:32:11 vmserver01 kernel: ATAPI device hda:

May 28 00:32:11 vmserver01 kernel: Error: Illegal request -- (Sense key=0x05)

May 28 00:32:11 vmserver01 kernel: Cannot read medium - incompatible format -- (asc=0x30, ascq=0x02)

May 28 00:32:11 vmserver01 kernel: The failed "Read Subchannel" packet command was:

May 28 00:32:11 vmserver01 kernel: "42 02 40 01 00 00 00 00 10 00 00 00 00 00 00 00 "

May 28 00:32:12 vmserver01 kernel: hda: packet command error: status=0x51 { DriveReady SeekComplete Error }

May 28 00:32:12 vmserver01 kernel: hda: packet command error: error=0x54

May 28 00:32:12 vmserver01 kernel: ide: failed opcode was 100

May 28 00:32:12 vmserver01 kernel: ATAPI device hda:

May 28 00:32:12 vmserver01 kernel: Error: Illegal request -- (Sense key=0x05)

May 28 00:32:12 vmserver01 kernel: Cannot read medium - incompatible format -- (asc=0x30, ascq=0x02)

May 28 00:32:12 vmserver01 kernel: The failed "Read Subchannel" packet command was:

May 28 00:32:12 vmserver01 kernel: "42 02 40 01 00 00 00 00 10 00 00 00 00 00 00 00 "

May 28 22:39:38 vmserver01 syslogd 1.4.1: restart.

May 28 22:39:38 vmserver01 syslog: syslogd startup succeeded

May 28 22:39:38 vmserver01 kernel: klogd 1.4.1, log source = /proc/kmsg started.

May 28 22:39:38 vmserver01 syslog: klogd startup succeeded

May 28 22:39:38 vmserver01 kernel: Linux version 2.6.9-5.ELhugemem (bhcompile@decompose.build.redhat.com) (gcc version 3.4.3 20041212 (Red Hat 3.4.3-9.EL4)) #1 SMP Wed Jan 5 19:38:36 EST 2005

twantrd 05-29-2007 11:15 PM

Well, from looking at the logs, yes, it appears to be a hard-drive error however, you stated that hda is part of a raid container. I'm assuming hda is raid 1. Therefore, if there was a hard-drive error, no big deal as it's part of a mirror anyways. Perhaps the controller is faulty/dying. Does the server come with any diagnostic tools (or on a CD like how dell has it) that you can run and perform a system component health check?

It would probably also help if you had another identical system and install RHEL on it and see if it mimics the same behavior. However, I understand if you don't.

-twantrd


All times are GMT -5. The time now is 12:00 PM.