LinuxQuestions.org - Redhat cluster malfunctioning

Hi experts,

I am new to this forum, the reason why i am here now is because of a production server cluster related issue that makes me commpletly disturbed.I am new to Redhat cluster as well.

This is a 2 node cluster,Operating system installed on these node is RHEL 5.3.

The system was running fine until last week, well things changed all of a sudden by making one of the node(node2) in 2 node cluster offline.

All the cluster related services were hung and the server was in a state not to reboot.I had to kill rgmmanager service to reboot the server, however the system rebooted and came up in cluster mode which made the other node (node1) off-line.

All that i understood from this was the cluster was unable to keep both the nodes on-line simultaneously.The same happened when i rebooted the node1,which killed the node2 upon its reboot.

I have now kept the node2 down in order to run the production application installed in this server.

Looking forward to your valuable reply as this is a really concerned issue for me which is in production environment.

Logs from node1 when the node2 was booted into cluster is pasted here for your ready reference.
MESSAGE FILE OUTPUT
---------------------

Feb 2 15:06:39 htbapp1 openais[3840]: [SYNC ] This node is within the primary component and will provide service.
Feb 2 15:06:39 htbapp1 kernel: Intel(R) Xeon(R) CPU E5520 @ 2.27GHz stepping 05
Feb 2 15:06:39 htbapp1 openais[3840]: [TOTEM] entering OPERATIONAL state.
Feb 2 15:06:39 htbapp1 kernel: Brought up 8 CPUs
Feb 2 15:06:39 htbapp1 openais[3840]: [MAIN ] Killing node htbapp2.ksebnet.com because it has rejoined the cluster with existing state
Feb 2 15:06:39 htbapp1 kernel: testing NMI watchdog ... OK.
Feb 2 15:06:40 htbapp1 kernel: time.c: Using 14.318180 MHz WALL HPET GTOD HPET/TSC timer.
Feb 2 15:06:40 htbapp1 kernel: time.c: Detected 2266.835 MHz processor.