LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Server (https://www.linuxquestions.org/questions/linux-server-73/)
-   -   RHEL CLuster - Node 2 _ Auto Reboot (https://www.linuxquestions.org/questions/linux-server-73/rhel-cluster-node-2-_-auto-reboot-4175420645/)

rajaniyer123 08-06-2012 09:25 AM

RHEL CLuster - Node 2 _ Auto Reboot
 
Hi,

Please note that I am experiencing issue in which the node2 of 2 Node RHEL Cluster reboots by its own 5-7 times in last 3-4 days.


Please see the /var/log/messages during the same.


ug* 4 20:20:27 node2-hostname openais[2707]: [TOTEM] The token was lost in the OPERATIONAL state.
Aug* 4 20:20:27 node2-hostname openais[2707]: [TOTEM] Receive multicast socket recv buffer size (320000 bytes).
Aug* 4 20:20:27 node2-hostname openais[2707]: [TOTEM] Transmit multicast socket send buffer size (262142 bytes).
Aug* 4 20:20:27 node2-hostname openais[2707]: [TOTEM] entering GATHER state from 2.
Aug* 4 20:20:27 node2-hostname openais[2707]: [TOTEM] Storing new sequence id for ring 7cc
Aug* 4 20:20:27 node2-hostname openais[2707]: [TOTEM] entering COMMIT state.
Aug* 4 20:20:27 node2-hostname openais[2707]: [TOTEM] entering RECOVERY state.
Aug* 4 20:20:27 node2-hostname openais[2707]: [TOTEM] position [0] member node1:
Aug* 4 20:20:27 node2-hostname openais[2707]: [TOTEM] previous ring seq 1992 rep node1
Aug* 4 20:20:27 node2-hostname openais[2707]: [TOTEM] aru f high delivered f received flag 1
Aug* 4 20:20:27 node2-hostname openais[2707]: [TOTEM] position [1] member node2:
Aug* 4 20:20:27 node2-hostname openais[2707]: [TOTEM] previous ring seq 1988 rep node1
Aug* 4 20:20:27 node2-hostname openais[2707]: [TOTEM] aru 57 high delivered 57 received flag 1
Aug* 4 20:20:27 node2-hostname openais[2707]: [TOTEM] Did not need to originate any messages in recovery.
Aug* 4 20:20:27 node2-hostname openais[2707]: [CLM* ] CLM CONFIGURATION CHANGE
Aug* 4 20:20:27 node2-hostname openais[2707]: [CLM* ] New Configuration:
Aug* 4 20:20:27 node2-hostname openais[2707]: [CLM* ]****** r(0) ip(node1)
Aug* 4 20:20:27 node2-hostname openais[2707]: [CLM* ]****** r(0) ip(node2)
Aug* 4 20:20:27 node2-hostname openais[2707]: [CLM* ] Members Left:
Aug* 4 20:20:27 node2-hostname openais[2707]: [CLM* ] Members Joined:
Aug* 4 20:20:27 node2-hostname openais[2707]: [CLM* ] CLM CONFIGURATION CHANGE
Aug* 4 20:20:27 node2-hostname openais[2707]: [CLM* ] New Configuration:
Aug* 4 20:20:27 node2-hostname openais[2707]: [CLM* ]****** r(0) ip(node1)
Aug* 4 20:20:27 node2-hostname openais[2707]: [CLM* ]****** r(0) ip(node2)
Aug* 4 20:20:27 node2-hostname openais[2707]: [CLM* ] Members Left:
Aug* 4 20:20:27 node2-hostname openais[2707]: [CLM* ] Members Joined:
Aug* 4 20:20:27 node2-hostname openais[2707]: [SYNC ] This node is within the primary component and will provide service.
Aug* 4 20:20:27 node2-hostname openais[2707]: [TOTEM] entering OPERATIONAL state.
Aug* 4 20:20:27 node2-hostname xinetd[2982]: START: nrpe pid=1317 from=10.105.32.115
Aug* 4 20:20:28 node2-hostname openais[2707]: [CMAN ] cman killed by node 1 because we rejoined the cluster without a full restart
Aug* 4 20:20:28 node2-hostname openais[2707]: [CLM* ] got nodejoin message node1
Aug* 4 20:20:28 node2-hostname openais[2707]: [CLM* ] got nodejoin message node2
Aug* 4 20:20:28 node2-hostname openais[2707]: [CPG* ] got joinlist message from node 1
Aug* 4 20:20:28 node2-hostname openais[2707]: [CPG* ] got joinlist message from node 2
Aug* 4 20:20:28 node2-hostname dlm_controld[2733]: cluster is down, exiting
Aug* 4 20:20:28 node2-hostname gfs_controld[2739]: groupd_dispatch error -1 errno 0
Aug* 4 20:20:28 node2-hostname gfs_controld[2739]: groupd connection died
Aug* 4 20:20:28 node2-hostname gfs_controld[2739]: cluster is down, exiting
Aug* 4 20:20:28 node2-hostname clurgmgrd[3630]: <warning> #67: Shutting down uncleanly
Aug* 4 20:20:29 node2-hostname fenced[2727]: cluster is down, exiting
Aug* 4 20:20:29 node2-hostname kernel: dlm: closing connection to node 2
Aug* 4 20:20:29 node2-hostname kernel: dlm: closing connection to node 1
Aug* 4 20:20:32 node2-hostname xinetd[2982]: EXIT: nrpe status=0 pid=1317 duration=5(sec)
Aug* 4 20:20:43 node2-hostname clurgmgrd[3630]: <notice> Disconnecting from CMAN
Aug* 4 20:20:43 node2-hostname clurgmgrd[3630]: <notice> Exiting
Aug* 4 20:20:57 node2-hostname ccsd[2699]: Unable to connect to cluster infrastructure after 30 seconds.
Aug* 4 20:21:27 node2-hostname ccsd[2699]: Unable to connect to cluster infrastructure after 60 seconds.
Aug* 4 20:21:57 node2-hostname ccsd[2699]: Unable to connect to cluster infrastructure after 90 seconds.
Aug* 4 20:22:27 node2-hostname ccsd[2699]: Unable to connect to cluster infrastructure after 120 seconds.


Please suggest

deadeyes 08-07-2012 06:28 AM

Code:

Aug* 4 20:20:28 node2-hostname gfs_controld[2739]: groupd_dispatch error -1 errno 0
Aug* 4 20:20:28 node2-hostname gfs_controld[2739]: groupd connection died

Maybe check why this fails?
Is this a active/passive cluster or ...?

I have been doing some stuff with RHCluster however have never been happy with it.

Code:

Aug* 4 20:20:28 node2-hostname openais[2707]: [CMAN ] cman killed by node 1 because we rejoined the cluster without a full restart
The last 4 messages are probably because of cman not running.

Can you check logs on the other node as well?

Also, what does clustat show?

Mlnr492 07-22-2017 12:54 AM

Quote:

Originally Posted by rajaniyer123 (Post 4747221)
Hi,

Please note that I am experiencing issue in which the node2 of 2 Node RHEL Cluster reboots by its own 5-7 times in last 3-4 days.


Please see the /var/log/messages during the same.


ug* 4 20:20:27 node2-hostname openais[2707]: [TOTEM] The token was lost in the OPERATIONAL state.
Aug* 4 20:20:27 node2-hostname openais[2707]: [TOTEM] Receive multicast socket recv buffer size (320000 bytes).
Aug* 4 20:20:27 node2-hostname openais[2707]: [TOTEM] Transmit multicast socket send buffer size (262142 bytes).
Aug* 4 20:20:27 node2-hostname openais[2707]: [TOTEM] entering GATHER state from 2.
Aug* 4 20:20:27 node2-hostname openais[2707]: [TOTEM] Storing new sequence id for ring 7cc
Aug* 4 20:20:27 node2-hostname openais[2707]: [TOTEM] entering COMMIT state.
Aug* 4 20:20:27 node2-hostname openais[2707]: [TOTEM] entering RECOVERY state.
Aug* 4 20:20:27 node2-hostname openais[2707]: [TOTEM] position [0] member node1:
Aug* 4 20:20:27 node2-hostname openais[2707]: [TOTEM] previous ring seq 1992 rep node1
Aug* 4 20:20:27 node2-hostname openais[2707]: [TOTEM] aru f high delivered f received flag 1
Aug* 4 20:20:27 node2-hostname openais[2707]: [TOTEM] position [1] member node2:
Aug* 4 20:20:27 node2-hostname openais[2707]: [TOTEM] previous ring seq 1988 rep node1
Aug* 4 20:20:27 node2-hostname openais[2707]: [TOTEM] aru 57 high delivered 57 received flag 1
Aug* 4 20:20:27 node2-hostname openais[2707]: [TOTEM] Did not need to originate any messages in recovery.
Aug* 4 20:20:27 node2-hostname openais[2707]: [CLM* ] CLM CONFIGURATION CHANGE
Aug* 4 20:20:27 node2-hostname openais[2707]: [CLM* ] New Configuration:
Aug* 4 20:20:27 node2-hostname openais[2707]: [CLM* ]****** r(0) ip(node1)
Aug* 4 20:20:27 node2-hostname openais[2707]: [CLM* ]****** r(0) ip(node2)
Aug* 4 20:20:27 node2-hostname openais[2707]: [CLM* ] Members Left:
Aug* 4 20:20:27 node2-hostname openais[2707]: [CLM* ] Members Joined:
Aug* 4 20:20:27 node2-hostname openais[2707]: [CLM* ] CLM CONFIGURATION CHANGE
Aug* 4 20:20:27 node2-hostname openais[2707]: [CLM* ] New Configuration:
Aug* 4 20:20:27 node2-hostname openais[2707]: [CLM* ]****** r(0) ip(node1)
Aug* 4 20:20:27 node2-hostname openais[2707]: [CLM* ]****** r(0) ip(node2)
Aug* 4 20:20:27 node2-hostname openais[2707]: [CLM* ] Members Left:
Aug* 4 20:20:27 node2-hostname openais[2707]: [CLM* ] Members Joined:
Aug* 4 20:20:27 node2-hostname openais[2707]: [SYNC ] This node is within the primary component and will provide service.
Aug* 4 20:20:27 node2-hostname openais[2707]: [TOTEM] entering OPERATIONAL state.
Aug* 4 20:20:27 node2-hostname xinetd[2982]: START: nrpe pid=1317 from=10.105.32.115
Aug* 4 20:20:28 node2-hostname openais[2707]: [CMAN ] cman killed by node 1 because we rejoined the cluster without a full restart
Aug* 4 20:20:28 node2-hostname openais[2707]: [CLM* ] got nodejoin message node1
Aug* 4 20:20:28 node2-hostname openais[2707]: [CLM* ] got nodejoin message node2
Aug* 4 20:20:28 node2-hostname openais[2707]: [CPG* ] got joinlist message from node 1
Aug* 4 20:20:28 node2-hostname openais[2707]: [CPG* ] got joinlist message from node 2
Aug* 4 20:20:28 node2-hostname dlm_controld[2733]: cluster is down, exiting
Aug* 4 20:20:28 node2-hostname gfs_controld[2739]: groupd_dispatch error -1 errno 0
Aug* 4 20:20:28 node2-hostname gfs_controld[2739]: groupd connection died
Aug* 4 20:20:28 node2-hostname gfs_controld[2739]: cluster is down, exiting
Aug* 4 20:20:28 node2-hostname clurgmgrd[3630]: <warning> #67: Shutting down uncleanly
Aug* 4 20:20:29 node2-hostname fenced[2727]: cluster is down, exiting
Aug* 4 20:20:29 node2-hostname kernel: dlm: closing connection to node 2
Aug* 4 20:20:29 node2-hostname kernel: dlm: closing connection to node 1
Aug* 4 20:20:32 node2-hostname xinetd[2982]: EXIT: nrpe status=0 pid=1317 duration=5(sec)
Aug* 4 20:20:43 node2-hostname clurgmgrd[3630]: <notice> Disconnecting from CMAN
Aug* 4 20:20:43 node2-hostname clurgmgrd[3630]: <notice> Exiting
Aug* 4 20:20:57 node2-hostname ccsd[2699]: Unable to connect to cluster infrastructure after 30 seconds.
Aug* 4 20:21:27 node2-hostname ccsd[2699]: Unable to connect to cluster infrastructure after 60 seconds.
Aug* 4 20:21:57 node2-hostname ccsd[2699]: Unable to connect to cluster infrastructure after 90 seconds.
Aug* 4 20:22:27 node2-hostname ccsd[2699]: Unable to connect to cluster infrastructure after 120 seconds.


Please suggest

Hi ranjaniyer i am also facing the same problem pls can u share the resolution


All times are GMT -5. The time now is 06:01 AM.