LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   Red Hat Cluster Problem (https://www.linuxquestions.org/questions/linux-newbie-8/red-hat-cluster-problem-934819/)

calicowboy54 03-16-2012 12:22 PM

Red Hat Cluster Problem
 
I have a problem with my RedHat 5.6 2 node cluster and fencing....

problem is at random time the cluster de-clusters and then shuts down the system, powers it off.... from what i can tell its in the fencing process when this happens. here is sample from log, other server logs just say system is going down at same time of time stamp:

Mar 15 16:41:13 sys1-cmdb2 openais[3955]: [TOTEM] entering GATHER state from 11.
Mar 15 16:41:13 sys1-cmdb2 openais[3955]: [TOTEM] Creating commit token because I am the rep.
Mar 15 16:41:13 sys1-cmdb2 openais[3955]: [TOTEM] entering RECOVERY state.
Mar 15 16:41:13 sys1-cmdb2 openais[3955]: [TOTEM] position [0] member 10.10.10.2:
Mar 15 16:41:13 sys1-cmdb2 openais[3955]: [TOTEM] previous ring seq 188 rep 10.10.10.1
Mar 15 16:41:13 sys1-cmdb2 openais[3955]: [TOTEM] aru 631 high delivered 631 received flag 1
Mar 15 16:41:13 sys1-cmdb2 openais[3955]: [TOTEM] Did not need to originate any messages in recovery.
Mar 15 16:41:13 sys1-cmdb2 openais[3955]: [TOTEM] Sending initial ORF token
Mar 15 16:41:13 sys1-cmdb2 openais[3955]: [CLM ] CLM CONFIGURATION CHANGE
Mar 15 16:41:13 sys1-cmdb2 openais[3955]: [CLM ] New Configuration:
Mar 15 16:41:13 sys1-cmdb2 openais[3955]: [CLM ] r(0) ip(10.10.10.2)
Mar 15 16:41:13 sys1-cmdb2 kernel: dlm: closing connection to node 1
Mar 15 16:41:13 sys1-cmdb2 fenced[3974]: sys1-cmdb1. not a cluster member after 0 sec post_fail_delay
Mar 15 16:41:13 sys1-cmdb2 openais[3955]: [CLM ] Members Left:
Mar 15 16:41:13 sys1-cmdb2 fenced[3974]: fencing node "sys1-cmdb1."
Mar 15 16:41:13 sys1-cmdb2 openais[3955]: [CLM ] r(0) ip(10.10.10.1)
Mar 15 16:41:13 sys1-cmdb2 openais[3955]: [CLM ] Members Joined:
Mar 15 16:41:14 sys1-cmdb2 openais[3955]: [CLM ] CLM CONFIGURATION CHANGE
Mar 15 16:41:14 sys1-cmdb2 openais[3955]: [CLM ] New Configuration:
Mar 15 16:41:14 sys1-cmdb2 openais[3955]: [CLM ] r(0) ip(10.10.10.2)
Mar 15 16:41:14 sys1-cmdb2 openais[3955]: [CLM ] Members Left:
Mar 15 16:41:14 sys1-cmdb2 openais[3955]: [CLM ] Members Joined:
Mar 15 16:41:14 sys1-cmdb2 openais[3955]: [SYNC ] This node is within the primary component and will pr
ovide service.
Mar 15 16:41:14 sys1-cmdb2 openais[3955]: [TOTEM] entering OPERATIONAL state.
Mar 15 16:41:14 sys1-cmdb2 openais[3955]: [CLM ] got nodejoin message 10.10.10.2
Mar 15 16:41:14 sys1-cmdb2 openais[3955]: [CPG ] got joinlist message from node 2
Mar 15 16:41:14 sys1-cmdb2 openais[3955]: [TOTEM] entering GATHER state from 9.
Mar 15 16:41:14 sys1-cmdb2 openais[3955]: [TOTEM] Storing new sequence id for ring c4
Mar 15 16:41:14 sys1-cmdb2 openais[3955]: [TOTEM] entering COMMIT state.
Mar 15 16:41:14 sys1-cmdb2 openais[3955]: [TOTEM] entering RECOVERY state.
Mar 15 16:41:14 sys1-cmdb2 openais[3955]: [TOTEM] position [0] member 10.10.10.1:
Mar 15 16:41:14 sys1-cmdb2 openais[3955]: [TOTEM] previous ring seq 192 rep 10.10.10.1
Mar 15 16:41:14 sys1-cmdb2 openais[3955]: [TOTEM] aru 11 high delivered 11 received flag 1
Mar 15 16:41:14 sys1-cmdb2 openais[3955]: [TOTEM] position [1] member 10.10.10.2:
Mar 15 16:41:14 sys1-cmdb2 gfs_controld[3986]: cluster is down, exiting
Mar 15 16:41:14 sys1-cmdb2 clurgmgrd[10489]: <warning> #67: Shutting down uncleanly
Mar 15 16:41:14 sys1-cmdb2 dlm_controld[3980]: cluster is down, exiting

John VV 03-18-2012 01:24 AM

Quote:

and then shuts down the system, powers it off
have you checked the cpu temp ?
it might be over heating

calicowboy54 03-19-2012 11:15 AM

Quote:

Originally Posted by John VV (Post 4629570)
have you checked the cpu temp ?
it might be over heating

yes i checked the CPU Temp on the ILO port and no over heating.... only thing thats showing in ILO logs is power failure on power supply 02 but that is not even hooked up.... and no power failure on power supply 01 just simply stopped and powered down....

Satyaveer Arya 03-19-2012 12:17 PM

Since you are using RedHat and paying for that, right? So you can take RedHat support: https://www.redhat.com/wapps/sso/log...port/cases/new you need to login first.


All times are GMT -5. The time now is 07:59 AM.