Linux - ServerThis forum is for the discussion of Linux Software used in a server related context.
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
I am a total linux Noob but my server keeps crashing, looks like hard drive error, but I dont have any SMART errors, and the array seems ok.
Looking for some help, this keeps on happening to my RHEL4 64bit Hugemem machine running vmware and it takes down the server.
I am running a Supermicro box with Dual Quad Core processors and 16GB of mem,
There are two arrays One Mirror set for the system and a RAID 10 for the VM volume.
The machine halted, no console, no ssh, only responds to ping, I see this in all the virtual terminals
ext3-fs error (device dm-0) in start_transaction:Journal has aborted
and after several hours of several days it happens again.
here is a section of /var/log/messages
it is interesting to note All the had and Ide errors and how at 12:32:12 last night the log just stops and then it picks up again at 22:39 when I rebooted it. Also after the reboot, the exact same errors start occurring again.
Well, from looking at the logs, yes, it appears to be a hard-drive error however, you stated that hda is part of a raid container. I'm assuming hda is raid 1. Therefore, if there was a hard-drive error, no big deal as it's part of a mirror anyways. Perhaps the controller is faulty/dying. Does the server come with any diagnostic tools (or on a CD like how dell has it) that you can run and perform a system component health check?
It would probably also help if you had another identical system and install RHEL on it and see if it mimics the same behavior. However, I understand if you don't.