LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Server (https://www.linuxquestions.org/questions/linux-server-73/)
-   -   Redhat cluster malfunctioning (https://www.linuxquestions.org/questions/linux-server-73/redhat-cluster-malfunctioning-929502/)

sree.m 02-15-2012 10:46 AM

Redhat cluster malfunctioning
 
Hi experts,

I am new to this forum, the reason why i am here now is because of a production server cluster related issue that makes me commpletly disturbed.I am new to Redhat cluster as well.

This is a 2 node cluster,Operating system installed on these node is RHEL 5.3.

The system was running fine until last week, well things changed all of a sudden by making one of the node(node2) in 2 node cluster offline.

All the cluster related services were hung and the server was in a state not to reboot.I had to kill rgmmanager service to reboot the server, however the system rebooted and came up in cluster mode which made the other node (node1) off-line.

All that i understood from this was the cluster was unable to keep both the nodes on-line simultaneously.The same happened when i rebooted the node1,which killed the node2 upon its reboot.

I have now kept the node2 down in order to run the production application installed in this server.

Looking forward to your valuable reply as this is a really concerned issue for me which is in production environment.

Logs from node1 when the node2 was booted into cluster is pasted here for your ready reference.
MESSAGE FILE OUTPUT
---------------------

Feb 2 15:06:39 htbapp1 openais[3840]: [SYNC ] This node is within the primary component and will provide service.
Feb 2 15:06:39 htbapp1 kernel: Intel(R) Xeon(R) CPU E5520 @ 2.27GHz stepping 05
Feb 2 15:06:39 htbapp1 openais[3840]: [TOTEM] entering OPERATIONAL state.
Feb 2 15:06:39 htbapp1 kernel: Brought up 8 CPUs
Feb 2 15:06:39 htbapp1 openais[3840]: [MAIN ] Killing node htbapp2.ksebnet.com because it has rejoined the cluster with existing state
Feb 2 15:06:39 htbapp1 kernel: testing NMI watchdog ... OK.
Feb 2 15:06:40 htbapp1 kernel: time.c: Using 14.318180 MHz WALL HPET GTOD HPET/TSC timer.
Feb 2 15:06:40 htbapp1 kernel: time.c: Detected 2266.835 MHz processor.

kbp 02-22-2012 05:04 PM

If this is a production issue then why haven't you raised a ticket with Red Hat support? .. they are the experts on this technology. Personally I wouldn't be running production without support unless I really knew what I was doing.

sree.m 04-18-2012 01:04 AM

Hi Guys,

This issue has been resolved !!! The culprit was "acpid" (power management)daemon that is not supposed to be running in cluster which caused the cluster nodes to mal-function. cluster started working perfect after the acpid daemon stopped in the startup.

Many thanks for your great tries and helps.

Rgrds,
Sree


All times are GMT -5. The time now is 04:06 AM.