LinuxQuestions.org
Visit Jeremy's Blog.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Server
User Name
Password
Linux - Server This forum is for the discussion of Linux Software used in a server related context.

Notices


Reply
  Search this Thread
Old 08-06-2012, 09:25 AM   #1
rajaniyer123
Member
 
Registered: Feb 2004
Location: BARODA, GUJARAT
Posts: 259

Rep: Reputation: 30
RHEL CLuster - Node 2 _ Auto Reboot


Hi,

Please note that I am experiencing issue in which the node2 of 2 Node RHEL Cluster reboots by its own 5-7 times in last 3-4 days.


Please see the /var/log/messages during the same.


ug* 4 20:20:27 node2-hostname openais[2707]: [TOTEM] The token was lost in the OPERATIONAL state.
Aug* 4 20:20:27 node2-hostname openais[2707]: [TOTEM] Receive multicast socket recv buffer size (320000 bytes).
Aug* 4 20:20:27 node2-hostname openais[2707]: [TOTEM] Transmit multicast socket send buffer size (262142 bytes).
Aug* 4 20:20:27 node2-hostname openais[2707]: [TOTEM] entering GATHER state from 2.
Aug* 4 20:20:27 node2-hostname openais[2707]: [TOTEM] Storing new sequence id for ring 7cc
Aug* 4 20:20:27 node2-hostname openais[2707]: [TOTEM] entering COMMIT state.
Aug* 4 20:20:27 node2-hostname openais[2707]: [TOTEM] entering RECOVERY state.
Aug* 4 20:20:27 node2-hostname openais[2707]: [TOTEM] position [0] member node1:
Aug* 4 20:20:27 node2-hostname openais[2707]: [TOTEM] previous ring seq 1992 rep node1
Aug* 4 20:20:27 node2-hostname openais[2707]: [TOTEM] aru f high delivered f received flag 1
Aug* 4 20:20:27 node2-hostname openais[2707]: [TOTEM] position [1] member node2:
Aug* 4 20:20:27 node2-hostname openais[2707]: [TOTEM] previous ring seq 1988 rep node1
Aug* 4 20:20:27 node2-hostname openais[2707]: [TOTEM] aru 57 high delivered 57 received flag 1
Aug* 4 20:20:27 node2-hostname openais[2707]: [TOTEM] Did not need to originate any messages in recovery.
Aug* 4 20:20:27 node2-hostname openais[2707]: [CLM* ] CLM CONFIGURATION CHANGE
Aug* 4 20:20:27 node2-hostname openais[2707]: [CLM* ] New Configuration:
Aug* 4 20:20:27 node2-hostname openais[2707]: [CLM* ]****** r(0) ip(node1)
Aug* 4 20:20:27 node2-hostname openais[2707]: [CLM* ]****** r(0) ip(node2)
Aug* 4 20:20:27 node2-hostname openais[2707]: [CLM* ] Members Left:
Aug* 4 20:20:27 node2-hostname openais[2707]: [CLM* ] Members Joined:
Aug* 4 20:20:27 node2-hostname openais[2707]: [CLM* ] CLM CONFIGURATION CHANGE
Aug* 4 20:20:27 node2-hostname openais[2707]: [CLM* ] New Configuration:
Aug* 4 20:20:27 node2-hostname openais[2707]: [CLM* ]****** r(0) ip(node1)
Aug* 4 20:20:27 node2-hostname openais[2707]: [CLM* ]****** r(0) ip(node2)
Aug* 4 20:20:27 node2-hostname openais[2707]: [CLM* ] Members Left:
Aug* 4 20:20:27 node2-hostname openais[2707]: [CLM* ] Members Joined:
Aug* 4 20:20:27 node2-hostname openais[2707]: [SYNC ] This node is within the primary component and will provide service.
Aug* 4 20:20:27 node2-hostname openais[2707]: [TOTEM] entering OPERATIONAL state.
Aug* 4 20:20:27 node2-hostname xinetd[2982]: START: nrpe pid=1317 from=10.105.32.115
Aug* 4 20:20:28 node2-hostname openais[2707]: [CMAN ] cman killed by node 1 because we rejoined the cluster without a full restart
Aug* 4 20:20:28 node2-hostname openais[2707]: [CLM* ] got nodejoin message node1
Aug* 4 20:20:28 node2-hostname openais[2707]: [CLM* ] got nodejoin message node2
Aug* 4 20:20:28 node2-hostname openais[2707]: [CPG* ] got joinlist message from node 1
Aug* 4 20:20:28 node2-hostname openais[2707]: [CPG* ] got joinlist message from node 2
Aug* 4 20:20:28 node2-hostname dlm_controld[2733]: cluster is down, exiting
Aug* 4 20:20:28 node2-hostname gfs_controld[2739]: groupd_dispatch error -1 errno 0
Aug* 4 20:20:28 node2-hostname gfs_controld[2739]: groupd connection died
Aug* 4 20:20:28 node2-hostname gfs_controld[2739]: cluster is down, exiting
Aug* 4 20:20:28 node2-hostname clurgmgrd[3630]: <warning> #67: Shutting down uncleanly
Aug* 4 20:20:29 node2-hostname fenced[2727]: cluster is down, exiting
Aug* 4 20:20:29 node2-hostname kernel: dlm: closing connection to node 2
Aug* 4 20:20:29 node2-hostname kernel: dlm: closing connection to node 1
Aug* 4 20:20:32 node2-hostname xinetd[2982]: EXIT: nrpe status=0 pid=1317 duration=5(sec)
Aug* 4 20:20:43 node2-hostname clurgmgrd[3630]: <notice> Disconnecting from CMAN
Aug* 4 20:20:43 node2-hostname clurgmgrd[3630]: <notice> Exiting
Aug* 4 20:20:57 node2-hostname ccsd[2699]: Unable to connect to cluster infrastructure after 30 seconds.
Aug* 4 20:21:27 node2-hostname ccsd[2699]: Unable to connect to cluster infrastructure after 60 seconds.
Aug* 4 20:21:57 node2-hostname ccsd[2699]: Unable to connect to cluster infrastructure after 90 seconds.
Aug* 4 20:22:27 node2-hostname ccsd[2699]: Unable to connect to cluster infrastructure after 120 seconds.


Please suggest
 
Old 08-07-2012, 06:28 AM   #2
deadeyes
Member
 
Registered: Aug 2006
Posts: 609

Rep: Reputation: 79
Code:
Aug* 4 20:20:28 node2-hostname gfs_controld[2739]: groupd_dispatch error -1 errno 0
Aug* 4 20:20:28 node2-hostname gfs_controld[2739]: groupd connection died
Maybe check why this fails?
Is this a active/passive cluster or ...?

I have been doing some stuff with RHCluster however have never been happy with it.

Code:
Aug* 4 20:20:28 node2-hostname openais[2707]: [CMAN ] cman killed by node 1 because we rejoined the cluster without a full restart
The last 4 messages are probably because of cman not running.

Can you check logs on the other node as well?

Also, what does clustat show?
 
Old 07-22-2017, 12:54 AM   #3
Mlnr492
LQ Newbie
 
Registered: Jul 2017
Posts: 1

Rep: Reputation: Disabled
Quote:
Originally Posted by rajaniyer123 View Post
Hi,

Please note that I am experiencing issue in which the node2 of 2 Node RHEL Cluster reboots by its own 5-7 times in last 3-4 days.


Please see the /var/log/messages during the same.


ug* 4 20:20:27 node2-hostname openais[2707]: [TOTEM] The token was lost in the OPERATIONAL state.
Aug* 4 20:20:27 node2-hostname openais[2707]: [TOTEM] Receive multicast socket recv buffer size (320000 bytes).
Aug* 4 20:20:27 node2-hostname openais[2707]: [TOTEM] Transmit multicast socket send buffer size (262142 bytes).
Aug* 4 20:20:27 node2-hostname openais[2707]: [TOTEM] entering GATHER state from 2.
Aug* 4 20:20:27 node2-hostname openais[2707]: [TOTEM] Storing new sequence id for ring 7cc
Aug* 4 20:20:27 node2-hostname openais[2707]: [TOTEM] entering COMMIT state.
Aug* 4 20:20:27 node2-hostname openais[2707]: [TOTEM] entering RECOVERY state.
Aug* 4 20:20:27 node2-hostname openais[2707]: [TOTEM] position [0] member node1:
Aug* 4 20:20:27 node2-hostname openais[2707]: [TOTEM] previous ring seq 1992 rep node1
Aug* 4 20:20:27 node2-hostname openais[2707]: [TOTEM] aru f high delivered f received flag 1
Aug* 4 20:20:27 node2-hostname openais[2707]: [TOTEM] position [1] member node2:
Aug* 4 20:20:27 node2-hostname openais[2707]: [TOTEM] previous ring seq 1988 rep node1
Aug* 4 20:20:27 node2-hostname openais[2707]: [TOTEM] aru 57 high delivered 57 received flag 1
Aug* 4 20:20:27 node2-hostname openais[2707]: [TOTEM] Did not need to originate any messages in recovery.
Aug* 4 20:20:27 node2-hostname openais[2707]: [CLM* ] CLM CONFIGURATION CHANGE
Aug* 4 20:20:27 node2-hostname openais[2707]: [CLM* ] New Configuration:
Aug* 4 20:20:27 node2-hostname openais[2707]: [CLM* ]****** r(0) ip(node1)
Aug* 4 20:20:27 node2-hostname openais[2707]: [CLM* ]****** r(0) ip(node2)
Aug* 4 20:20:27 node2-hostname openais[2707]: [CLM* ] Members Left:
Aug* 4 20:20:27 node2-hostname openais[2707]: [CLM* ] Members Joined:
Aug* 4 20:20:27 node2-hostname openais[2707]: [CLM* ] CLM CONFIGURATION CHANGE
Aug* 4 20:20:27 node2-hostname openais[2707]: [CLM* ] New Configuration:
Aug* 4 20:20:27 node2-hostname openais[2707]: [CLM* ]****** r(0) ip(node1)
Aug* 4 20:20:27 node2-hostname openais[2707]: [CLM* ]****** r(0) ip(node2)
Aug* 4 20:20:27 node2-hostname openais[2707]: [CLM* ] Members Left:
Aug* 4 20:20:27 node2-hostname openais[2707]: [CLM* ] Members Joined:
Aug* 4 20:20:27 node2-hostname openais[2707]: [SYNC ] This node is within the primary component and will provide service.
Aug* 4 20:20:27 node2-hostname openais[2707]: [TOTEM] entering OPERATIONAL state.
Aug* 4 20:20:27 node2-hostname xinetd[2982]: START: nrpe pid=1317 from=10.105.32.115
Aug* 4 20:20:28 node2-hostname openais[2707]: [CMAN ] cman killed by node 1 because we rejoined the cluster without a full restart
Aug* 4 20:20:28 node2-hostname openais[2707]: [CLM* ] got nodejoin message node1
Aug* 4 20:20:28 node2-hostname openais[2707]: [CLM* ] got nodejoin message node2
Aug* 4 20:20:28 node2-hostname openais[2707]: [CPG* ] got joinlist message from node 1
Aug* 4 20:20:28 node2-hostname openais[2707]: [CPG* ] got joinlist message from node 2
Aug* 4 20:20:28 node2-hostname dlm_controld[2733]: cluster is down, exiting
Aug* 4 20:20:28 node2-hostname gfs_controld[2739]: groupd_dispatch error -1 errno 0
Aug* 4 20:20:28 node2-hostname gfs_controld[2739]: groupd connection died
Aug* 4 20:20:28 node2-hostname gfs_controld[2739]: cluster is down, exiting
Aug* 4 20:20:28 node2-hostname clurgmgrd[3630]: <warning> #67: Shutting down uncleanly
Aug* 4 20:20:29 node2-hostname fenced[2727]: cluster is down, exiting
Aug* 4 20:20:29 node2-hostname kernel: dlm: closing connection to node 2
Aug* 4 20:20:29 node2-hostname kernel: dlm: closing connection to node 1
Aug* 4 20:20:32 node2-hostname xinetd[2982]: EXIT: nrpe status=0 pid=1317 duration=5(sec)
Aug* 4 20:20:43 node2-hostname clurgmgrd[3630]: <notice> Disconnecting from CMAN
Aug* 4 20:20:43 node2-hostname clurgmgrd[3630]: <notice> Exiting
Aug* 4 20:20:57 node2-hostname ccsd[2699]: Unable to connect to cluster infrastructure after 30 seconds.
Aug* 4 20:21:27 node2-hostname ccsd[2699]: Unable to connect to cluster infrastructure after 60 seconds.
Aug* 4 20:21:57 node2-hostname ccsd[2699]: Unable to connect to cluster infrastructure after 90 seconds.
Aug* 4 20:22:27 node2-hostname ccsd[2699]: Unable to connect to cluster infrastructure after 120 seconds.


Please suggest
Hi ranjaniyer i am also facing the same problem pls can u share the resolution
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Repeated and unexpected reboot of a VCS cluster node abhay1983 Linux - Server 2 02-26-2012 07:05 AM
GFS2 RHEL 6.1 1 Node Cluster sharing VNX SAN over Fibre drakal30 Linux - Server 2 02-21-2012 08:43 AM
Need Help - Two node cluster, RHEL 6 High Availability Add on , with Oracle over NFS ineedtosolvetheproblem Red Hat 1 09-28-2011 11:21 PM
RHEL Cluster service not relocate in the case of active node power failure cj_cheema Linux - Server 0 09-18-2010 11:58 AM
Frequent RHEL cluster node crash/restarts aix_tiger Linux - Enterprise 0 07-07-2007 07:04 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Server

All times are GMT -5. The time now is 09:08 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration