Redhat cluster malfunctioning
I am new to this forum, the reason why i am here now is because of a production server cluster related issue that makes me commpletly disturbed.I am new to Redhat cluster as well.
This is a 2 node cluster,Operating system installed on these node is RHEL 5.3.
The system was running fine until last week, well things changed all of a sudden by making one of the node(node2) in 2 node cluster offline.
All the cluster related services were hung and the server was in a state not to reboot.I had to kill rgmmanager service to reboot the server, however the system rebooted and came up in cluster mode which made the other node (node1) off-line.
All that i understood from this was the cluster was unable to keep both the nodes on-line simultaneously.The same happened when i rebooted the node1,which killed the node2 upon its reboot.
I have now kept the node2 down in order to run the production application installed in this server.
Looking forward to your valuable reply as this is a really concerned issue for me which is in production environment.
Logs from node1 when the node2 was booted into cluster is pasted here for your ready reference.
MESSAGE FILE OUTPUT
Feb 2 15:06:39 htbapp1 openais: [SYNC ] This node is within the primary component and will provide service.
Feb 2 15:06:39 htbapp1 kernel: Intel(R) Xeon(R) CPU E5520 @ 2.27GHz stepping 05
Feb 2 15:06:39 htbapp1 openais: [TOTEM] entering OPERATIONAL state.
Feb 2 15:06:39 htbapp1 kernel: Brought up 8 CPUs
Feb 2 15:06:39 htbapp1 openais: [MAIN ] Killing node htbapp2.ksebnet.com because it has rejoined the cluster with existing state
Feb 2 15:06:39 htbapp1 kernel: testing NMI watchdog ... OK.
Feb 2 15:06:40 htbapp1 kernel: time.c: Using 14.318180 MHz WALL HPET GTOD HPET/TSC timer.
Feb 2 15:06:40 htbapp1 kernel: time.c: Detected 2266.835 MHz processor.