LinuxQuestions.org
Visit Jeremy's Blog.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Enterprise Linux Forums > Linux - Enterprise
User Name
Password
Linux - Enterprise This forum is for all items relating to using Linux in the Enterprise.

Notices


Reply
  Search this Thread
Old 10-16-2008, 07:43 AM   #1
fft9qh
LQ Newbie
 
Registered: Oct 2008
Posts: 12

Rep: Reputation: 0
CMAN cannot rejoin cluster after restart


Hi all,

I have two node cluster using quorum disk. I use RHEL 5.2 with XEN. I have very strange issue:
When I start the nodes everything works fine. I can migrate the services etc.

Cluster Status for DEVIL @ Thu Oct 16 14:24:20 2008
Member Status: Quorate

Member Name ID Status
------ ---- ---- ------
volovec.onsemi.com 1 Online, rgmanager
baranec.onsemi.com 2 Online, Local, rgmanager
/dev/emcpowerc1 0 Online, Quorum Disk

Service Name Owner (Last) State
------- ---- ----- ------ -----
service:nfsserver volovec.onsemi.com started
service:web baranec.onsemi.com started

Case A: when I restart node 1 (volovec). Everything is correct.
Services are migrated from volovec to baranec
node volovec is disconnected from cluster
after rebooting the node volovec is reconnected into cluster and default services are migrated back to node.

Case B: very strage is when I do this with node 2 (baranec).
Services are migrated to volovec
node baranec is disconnected from cluster
but after rebooting the node baranec cannot join the cluster!

Messages on node Baranec:

Baranec shutdown sequence


Oct 16 13:37:09 baranec shutdown[15991]: shutting down for system reboot
Oct 16 13:37:09 baranec init: Switching to runlevel: 6
Oct 16 13:37:10 baranec modclusterd: shutdown succeeded
Oct 16 13:37:10 baranec rgmanager: [16112]: <notice> Shutting down Cluster Service Manager...
Oct 16 13:37:10 baranec clurgmgrd[15019]: <notice> Shutting down
Oct 16 13:37:10 baranec clurgmgrd[15019]: <notice> Shutting down
Oct 16 13:37:10 baranec clurgmgrd[15019]: <notice> Stopping service service:web
Oct 16 13:37:12 baranec clurgmgrd[15019]: <notice> Service service:web is stopped
Oct 16 13:37:12 baranec clurgmgrd[15019]: <notice> Shutdown complete, exiting
Oct 16 13:37:12 baranec rgmanager: [16112]: <notice> Cluster Service Manager is stopped.
Oct 16 13:37:43 baranec dlm_controld[9843]: cluster is down, exiting
Oct 16 13:37:43 baranec fenced[9837]: cluster is down, exiting
Oct 16 13:37:43 baranec gfs_controld[9849]: cluster is down, exiting
Oct 16 13:37:43 baranec kernel: dlm: closing connection to node 2
Oct 16 13:37:43 baranec kernel: dlm: closing connection to node 1
Oct 16 13:37:43 baranec ccsd[9803]: Stopping ccsd, SIGTERM received.
Oct 16 13:37:44 baranec rpc.statd[9600]: Caught signal 15, un-registering and exiting.


Baranec startup sequence


Oct 16 13:41:01 baranec kernel: DLM (built Sep 4 2008 04:07:33) installed
Oct 16 13:41:01 baranec kernel: GFS2 Overlay (built Apr 30 2008 17:31:00) installed
Oct 16 13:41:01 baranec kernel: Lock_DLM (built Sep 4 2008 04:08:21) installed
Oct 16 13:41:02 baranec ccsd[9817]: Starting ccsd 2.0.84:
Oct 16 13:41:02 baranec ccsd[9817]: Built: Sep 9 2008 16:28:45
Oct 16 13:41:02 baranec ccsd[9817]: Copyright (C) Red Hat, Inc. 2004 All rights reserved.
Oct 16 13:41:02 baranec ccsd[9817]: cluster.conf (cluster name = DEVIL, version = 11) found.
Oct 16 13:41:02 baranec ccsd[9817]: Unable to perform sendto: Cannot assign requested address
Oct 16 13:41:07 baranec openais[9823]: [MAIN ] AIS Executive Service RELEASE 'subrev 1358 version 0.80.3'
Oct 16 13:41:07 baranec openais[9823]: [MAIN ] Copyright (C) 2002-2006 MontaVista Software, Inc and contributors.
Oct 16 13:41:07 baranec openais[9823]: [MAIN ] Copyright (C) 2006 Red Hat, Inc.
Oct 16 13:41:07 baranec openais[9823]: [MAIN ] AIS Executive Service: started and ready to provide service.
Oct 16 13:41:07 baranec openais[9823]: [MAIN ] Using default multicast address of 239.192.8.166
Oct 16 13:41:07 baranec openais[9823]: [MAIN ] openais component openais_cpg loaded.
Oct 16 13:41:07 baranec openais[9823]: [MAIN ] Registering service handler 'openais cluster closed process group service v1.01'
Oct 16 13:41:07 baranec openais[9823]: [MAIN ] openais component openais_cfg loaded.
Oct 16 13:41:07 baranec openais[9823]: [MAIN ] Registering service handler 'openais configuration service'
Oct 16 13:41:07 baranec openais[9823]: [MAIN ] openais component openais_msg loaded.
Oct 16 13:41:07 baranec openais[9823]: [MAIN ] Registering service handler 'openais message service B.01.01'
Oct 16 13:41:07 baranec openais[9823]: [MAIN ] openais component openais_lck loaded.
Oct 16 13:41:07 baranec openais[9823]: [MAIN ] Registering service handler 'openais distributed locking service B.01.01'
Oct 16 13:41:07 baranec openais[9823]: [MAIN ] openais component openais_evt loaded.
Oct 16 13:41:07 baranec openais[9823]: [MAIN ] Registering service handler 'openais event service B.01.01'
Oct 16 13:41:07 baranec openais[9823]: [MAIN ] openais component openais_ckpt loaded.
Oct 16 13:41:07 baranec openais[9823]: [MAIN ] Registering service handler 'openais checkpoint service B.01.01'
Oct 16 13:41:07 baranec openais[9823]: [MAIN ] openais component openais_amf loaded.
Oct 16 13:41:07 baranec openais[9823]: [MAIN ] Registering service handler 'openais availability management framework B.01.01'
Oct 16 13:41:07 baranec openais[9823]: [MAIN ] openais component openais_clm loaded.
Oct 16 13:41:07 baranec openais[9823]: [MAIN ] Registering service handler 'openais cluster membership service B.01.01'
Oct 16 13:41:07 baranec openais[9823]: [MAIN ] openais component openais_evs loaded.
Oct 16 13:41:07 baranec openais[9823]: [MAIN ] Registering service handler 'openais extended virtual synchrony service'
Oct 16 13:41:07 baranec openais[9823]: [MAIN ] openais component openais_cman loaded.
Oct 16 13:41:07 baranec openais[9823]: [MAIN ] Registering service handler 'openais CMAN membership service 2.01'
Oct 16 13:41:07 baranec openais[9823]: [TOTEM] Token Timeout (25000 ms) retransmit timeout (124 ms)
Oct 16 13:41:07 baranec openais[9823]: [TOTEM] token hold (89 ms) retransmits before loss (200 retrans)
Oct 16 13:41:07 baranec openais[9823]: [TOTEM] join (120 ms) send_join (0 ms) consensus (4800 ms) merge (200 ms)
Oct 16 13:41:07 baranec openais[9823]: [TOTEM] downcheck (1000 ms) fail to recv const (50 msgs)
Oct 16 13:41:07 baranec openais[9823]: [TOTEM] seqno unchanged const (30 rotations) Maximum network MTU 1500
Oct 16 13:41:07 baranec openais[9823]: [TOTEM] window size per rotation (50 messages) maximum messages per rotation (17 messages)
Oct 16 13:41:07 baranec openais[9823]: [TOTEM] send threads (0 threads)
Oct 16 13:41:07 baranec openais[9823]: [TOTEM] RRP token expired timeout (124 ms)
Oct 16 13:41:07 baranec openais[9823]: [TOTEM] RRP token problem counter (2000 ms)
Oct 16 13:41:07 baranec openais[9823]: [TOTEM] RRP threshold (10 problem count)
Oct 16 13:41:07 baranec openais[9823]: [TOTEM] RRP mode set to none.
Oct 16 13:41:07 baranec openais[9823]: [TOTEM] heartbeat_failures_allowed (0)
Oct 16 13:41:07 baranec openais[9823]: [TOTEM] max_network_delay (50 ms)
Oct 16 13:41:07 baranec openais[9823]: [TOTEM] HeartBeat is Disabled. To enable set heartbeat_failures_allowed > 0
Oct 16 13:41:07 baranec openais[9823]: [TOTEM] Receive multicast socket recv buffer size (262142 bytes).
Oct 16 13:41:07 baranec openais[9823]: [TOTEM] Transmit multicast socket send buffer size (262142 bytes).
Oct 16 13:41:07 baranec openais[9823]: [TOTEM] The network interface [10.250.40.24] is now up.
Oct 16 13:41:07 baranec openais[9823]: [TOTEM] Created or loaded sequence id 9812.10.250.40.24 for this ring.
Oct 16 13:41:07 baranec openais[9823]: [TOTEM] entering GATHER state from 15.
Oct 16 13:41:07 baranec openais[9823]: [SERV ] Initialising service handler 'openais extended virtual synchrony service'
Oct 16 13:41:07 baranec openais[9823]: [SERV ] Initialising service handler 'openais cluster membership service B.01.01'
Oct 16 13:41:07 baranec openais[9823]: [SERV ] Initialising service handler 'openais availability management framework B.01.01'
Oct 16 13:41:07 baranec openais[9823]: [SERV ] Initialising service handler 'openais checkpoint service B.01.01'
Oct 16 13:41:07 baranec openais[9823]: [SERV ] Initialising service handler 'openais event service B.01.01'
Oct 16 13:41:07 baranec openais[9823]: [SERV ] Initialising service handler 'openais distributed locking service B.01.01'
Oct 16 13:41:07 baranec openais[9823]: [SERV ] Initialising service handler 'openais message service B.01.01'
Oct 16 13:41:07 baranec openais[9823]: [SERV ] Initialising service handler 'openais configuration service'
Oct 16 13:41:07 baranec openais[9823]: [SERV ] Initialising service handler 'openais cluster closed process group service v1.01'
Oct 16 13:41:07 baranec openais[9823]: [SERV ] Initialising service handler 'openais CMAN membership service 2.01'
Oct 16 13:41:07 baranec openais[9823]: [CMAN ] CMAN 2.0.84 (built Sep 9 2008 16:28:49) started
Oct 16 13:41:07 baranec openais[9823]: [SYNC ] Not using a virtual synchrony filter.
Oct 16 13:41:07 baranec openais[9823]: [TOTEM] Creating commit token because I am the rep.
Oct 16 13:41:07 baranec openais[9823]: [TOTEM] Saving state aru 0 high seq received 0
Oct 16 13:41:07 baranec openais[9823]: [TOTEM] Storing new sequence id for ring 2658
Oct 16 13:41:07 baranec openais[9823]: [TOTEM] entering COMMIT state.
Oct 16 13:41:07 baranec openais[9823]: [TOTEM] entering RECOVERY state.
Oct 16 13:41:07 baranec openais[9823]: [TOTEM] position [0] member 10.250.40.24:
Oct 16 13:41:07 baranec openais[9823]: [TOTEM] previous ring seq 9812 rep 10.250.40.24
Oct 16 13:41:07 baranec openais[9823]: [TOTEM] aru 0 high delivered 0 received flag 1
Oct 16 13:41:07 baranec openais[9823]: [TOTEM] Did not need to originate any messages in recovery.
Oct 16 13:41:07 baranec openais[9823]: [TOTEM] Sending initial ORF token
Oct 16 13:41:07 baranec openais[9823]: [CLM ] CLM CONFIGURATION CHANGE
Oct 16 13:41:07 baranec openais[9823]: [CLM ] New Configuration:
Oct 16 13:41:07 baranec openais[9823]: [CLM ] Members Left:
Oct 16 13:41:07 baranec openais[9823]: [CLM ] Members Joined:
Oct 16 13:41:07 baranec openais[9823]: [CLM ] CLM CONFIGURATION CHANGE
Oct 16 13:41:07 baranec openais[9823]: [CLM ] New Configuration:
Oct 16 13:41:07 baranec openais[9823]: [CLM ] r(0) ip(10.250.40.24)
Oct 16 13:41:07 baranec openais[9823]: [CLM ] Members Left:
Oct 16 13:41:07 baranec openais[9823]: [CLM ] Members Joined:
Oct 16 13:41:07 baranec openais[9823]: [CLM ] r(0) ip(10.250.40.24)
Oct 16 13:41:07 baranec openais[9823]: [SYNC ] This node is within the primary component and will provide service.
Oct 16 13:41:07 baranec openais[9823]: [TOTEM] entering OPERATIONAL state.
Oct 16 13:41:07 baranec openais[9823]: [CLM ] got nodejoin message 10.250.40.24
Oct 16 13:41:08 baranec ccsd[9817]: Initial status:: Inquorate
Oct 16 13:41:08 baranec ccsd[9817]: Cluster is not quorate. Refusing connection.

Oct 16 13:41:10 baranec ccsd[9817]: Error while processing connect: Connection refused
Oct 16 13:41:10 baranec ccsd[9817]: Cluster is not quorate. Refusing connection.
Oct 16 13:41:18 baranec dlm_controld[9845]: connect to ccs error -111, check ccsd or cluster status
Oct 16 13:41:18 baranec gfs_controld[9851]: connect to ccs error -111, check ccsd or cluster status
Oct 16 13:46:15 baranec qdiskd[9885]: <info> Node 1 is the master

Oct 16 13:46:19 baranec qdiskd[9885]: <info> Initial score 1/1
Oct 16 13:46:19 baranec qdiskd[9885]: <info> Initialization complete
Oct 16 13:46:19 baranec openais[9823]: [CMAN ] quorum device registered
Oct 16 13:46:19 baranec ccsd[9817]: Cluster is not quorate. Refusing connection.
Oct 16 13:46:19 baranec ccsd[9817]: Error while processing connect: Connection refused
Oct 16 13:46:19 baranec ccsd[9817]: Cluster is not quorate. Refusing connection.
Oct 16 13:46:19 baranec ccsd[9817]: Error while processing connect: Connection refused
Oct 16 13:46:19 baranec qdiskd[9885]: <notice> Score sufficient for master operation (1/1; required=1); upgrading



Messages on node Volovec:

during Baranec shutdown:

Oct 16 13:37:12 volovec clurgmgrd[15014]: <notice> Member 2 shutting down
Oct 16 13:37:17 volovec clurgmgrd[15014]: <notice> Starting stopped service service:web
Oct 16 13:37:18 volovec clurgmgrd[15014]: <notice> Service service:web started
Oct 16 13:37:42 volovec qdiskd[9885]: <info> Node 2 shutdown
Oct 16 13:38:08 volovec openais[9813]: [TOTEM] The token was lost in the OPERATIONAL state.
Oct 16 13:38:08 volovec openais[9813]: [TOTEM] Receive multicast socket recv buffer size (288000 bytes).
Oct 16 13:38:08 volovec openais[9813]: [TOTEM] Transmit multicast socket send buffer size (262142 bytes).
Oct 16 13:38:08 volovec openais[9813]: [TOTEM] entering GATHER state from 2.
Oct 16 13:38:13 volovec openais[9813]: [TOTEM] entering GATHER state from 0.
Oct 16 13:38:13 volovec openais[9813]: [TOTEM] Creating commit token because I am the rep.
Oct 16 13:38:13 volovec openais[9813]: [TOTEM] Saving state aru 69 high seq received 69
Oct 16 13:38:13 volovec openais[9813]: [TOTEM] Storing new sequence id for ring 2658
Oct 16 13:38:13 volovec openais[9813]: [TOTEM] entering COMMIT state.
Oct 16 13:38:13 volovec openais[9813]: [TOTEM] entering RECOVERY state.
Oct 16 13:38:13 volovec openais[9813]: [TOTEM] position [0] member 10.250.40.14:
Oct 16 13:38:13 volovec openais[9813]: [TOTEM] previous ring seq 9812 rep 10.250.40.14
Oct 16 13:38:13 volovec openais[9813]: [TOTEM] aru 69 high delivered 69 received flag 1
Oct 16 13:38:13 volovec openais[9813]: [TOTEM] Did not need to originate any messages in recovery.
Oct 16 13:38:13 volovec openais[9813]: [TOTEM] Sending initial ORF token
Oct 16 13:38:13 volovec openais[9813]: [CLM ] CLM CONFIGURATION CHANGE
Oct 16 13:38:13 volovec openais[9813]: [CLM ] New Configuration:
Oct 16 13:38:13 volovec openais[9813]: [CLM ] r(0) ip(10.250.40.14)
Oct 16 13:38:13 volovec openais[9813]: [CLM ] Members Left:
Oct 16 13:38:13 volovec kernel: dlm: closing connection to node 2
Oct 16 13:38:13 volovec openais[9813]: [CLM ] r(0) ip(10.250.40.24)
Oct 16 13:38:13 volovec openais[9813]: [CLM ] Members Joined:
Oct 16 13:38:13 volovec openais[9813]: [CLM ] CLM CONFIGURATION CHANGE
Oct 16 13:38:13 volovec openais[9813]: [CLM ] New Configuration:
Oct 16 13:38:13 volovec openais[9813]: [CLM ] r(0) ip(10.250.40.14)
Oct 16 13:38:13 volovec openais[9813]: [CLM ] Members Left:
Oct 16 13:38:13 volovec openais[9813]: [CLM ] Members Joined:
Oct 16 13:38:13 volovec openais[9813]: [SYNC ] This node is within the primary component and will provide service.
Oct 16 13:38:13 volovec openais[9813]: [TOTEM] entering OPERATIONAL state.
Oct 16 13:38:13 volovec openais[9813]: [CLM ] got nodejoin message 10.250.40.14
Oct 16 13:38:13 volovec openais[9813]: [CPG ] got joinlist message from node 1


after baranec startup

Oct 16 13:41:07 volovec openais[9813]: [TOTEM] entering GATHER state from 11.
Oct 16 13:41:12 volovec openais[9813]: [TOTEM] entering GATHER state from 0.
Oct 16 13:41:12 volovec openais[9813]: [TOTEM] Creating commit token because I am the rep.
Oct 16 13:41:12 volovec openais[9813]: [TOTEM] Saving state aru c high seq received c
Oct 16 13:41:12 volovec openais[9813]: [TOTEM] Storing new sequence id for ring 265c
Oct 16 13:41:12 volovec openais[9813]: [TOTEM] entering COMMIT state.
Oct 16 13:41:12 volovec openais[9813]: [TOTEM] entering RECOVERY state.
Oct 16 13:41:12 volovec openais[9813]: [TOTEM] position [0] member 10.250.40.14:
Oct 16 13:41:12 volovec openais[9813]: [TOTEM] previous ring seq 9816 rep 10.250.40.14
Oct 16 13:41:12 volovec openais[9813]: [TOTEM] aru c high delivered c received flag 1
Oct 16 13:41:12 volovec openais[9813]: [TOTEM] Did not need to originate any messages in recovery.
Oct 16 13:41:12 volovec openais[9813]: [TOTEM] Sending initial ORF token
Oct 16 13:41:12 volovec openais[9813]: [CLM ] CLM CONFIGURATION CHANGE
Oct 16 13:41:12 volovec openais[9813]: [CLM ] New Configuration:
Oct 16 13:41:12 volovec openais[9813]: [CLM ] r(0) ip(10.250.40.14)
Oct 16 13:41:12 volovec openais[9813]: [CLM ] Members Left:
Oct 16 13:41:12 volovec openais[9813]: [CLM ] Members Joined:
Oct 16 13:41:12 volovec openais[9813]: [CLM ] CLM CONFIGURATION CHANGE
Oct 16 13:41:12 volovec openais[9813]: [CLM ] New Configuration:
Oct 16 13:41:12 volovec openais[9813]: [CLM ] r(0) ip(10.250.40.14)
Oct 16 13:41:12 volovec openais[9813]: [CLM ] Members Left:
Oct 16 13:41:12 volovec openais[9813]: [CLM ] Members Joined:
Oct 16 13:41:12 volovec openais[9813]: [SYNC ] This node is within the primary component and will provide service.
Oct 16 13:41:12 volovec openais[9813]: [TOTEM] entering OPERATIONAL state.
Oct 16 13:41:12 volovec openais[9813]: [CLM ] got nodejoin message 10.250.40.14
Oct 16 13:41:12 volovec openais[9813]: [CPG ] got joinlist message from node 1
Oct 16 13:41:12 volovec openais[9813]: [TOTEM] entering GATHER state from 9.
Oct 16 13:41:17 volovec openais[9813]: [TOTEM] entering GATHER state from 0.
Oct 16 13:41:17 volovec openais[9813]: [TOTEM] Creating commit token because I am the rep.
Oct 16 13:41:17 volovec openais[9813]: [TOTEM] Saving state aru b high seq received b
Oct 16 13:41:17 volovec openais[9813]: [TOTEM] Storing new sequence id for ring 2660
Oct 16 13:41:17 volovec openais[9813]: [TOTEM] entering COMMIT state.
Oct 16 13:41:17 volovec openais[9813]: [TOTEM] entering RECOVERY state.
Oct 16 13:41:17 volovec openais[9813]: [TOTEM] position [0] member 10.250.40.14:
Oct 16 13:41:17 volovec openais[9813]: [TOTEM] previous ring seq 9820 rep 10.250.40.14
Oct 16 13:41:17 volovec openais[9813]: [TOTEM] aru b high delivered b received flag 1
Oct 16 13:41:17 volovec openais[9813]: [TOTEM] Did not need to originate any messages in recovery.
Oct 16 13:41:17 volovec openais[9813]: [TOTEM] Sending initial ORF token
Oct 16 13:41:17 volovec openais[9813]: [CLM ] CLM CONFIGURATION CHANGE
Oct 16 13:41:17 volovec openais[9813]: [CLM ] New Configuration:
Oct 16 13:41:17 volovec openais[9813]: [CLM ] r(0) ip(10.250.40.14)
Oct 16 13:41:17 volovec openais[9813]: [CLM ] Members Left:
Oct 16 13:41:17 volovec openais[9813]: [CLM ] Members Joined:
Oct 16 13:41:17 volovec openais[9813]: [CLM ] CLM CONFIGURATION CHANGE
Oct 16 13:41:17 volovec openais[9813]: [CLM ] New Configuration:
Oct 16 13:41:17 volovec openais[9813]: [CLM ] r(0) ip(10.250.40.14)
Oct 16 13:41:17 volovec openais[9813]: [CLM ] Members Left:
Oct 16 13:41:17 volovec openais[9813]: [CLM ] Members Joined:
Oct 16 13:41:17 volovec openais[9813]: [SYNC ] This node is within the primary component and will provide service.
Oct 16 13:41:17 volovec openais[9813]: [TOTEM] entering OPERATIONAL state.
Oct 16 13:41:17 volovec openais[9813]: [CLM ] got nodejoin message 10.250.40.14
Oct 16 13:41:17 volovec openais[9813]: [CPG ] got joinlist message from node 1
Oct 16 13:41:17 volovec openais[9813]: [TOTEM] entering GATHER state from 9.


The result is that on started node baranec is infinite loop

Oct 16 13:41:10 baranec ccsd[9817]: Error while processing connect: Connection refused
Oct 16 13:41:10 baranec ccsd[9817]: Cluster is not quorate. Refusing connection.
Oct 16 13:41:18 baranec dlm_controld[9845]: connect to ccs error -111, check ccsd or cluster status
Oct 16 13:41:18 baranec gfs_controld[9851]: connect to ccs error -111, check ccsd or cluster status

and on "waiting" node Volovec is infinite loop

Oct 16 13:41:22 volovec openais[9813]: [TOTEM] entering GATHER state from 0.
Oct 16 13:41:22 volovec openais[9813]: [TOTEM] Creating commit token because I am the rep.
Oct 16 13:41:22 volovec openais[9813]: [TOTEM] Saving state aru b high seq received b
Oct 16 13:41:22 volovec openais[9813]: [TOTEM] Storing new sequence id for ring 2664
Oct 16 13:41:22 volovec openais[9813]: [TOTEM] entering COMMIT state.
Oct 16 13:41:22 volovec openais[9813]: [TOTEM] entering RECOVERY state.
Oct 16 13:41:22 volovec openais[9813]: [TOTEM] position [0] member 10.250.40.14:
Oct 16 13:41:22 volovec openais[9813]: [TOTEM] previous ring seq 9824 rep 10.250.40.14
Oct 16 13:41:22 volovec openais[9813]: [TOTEM] aru b high delivered b received flag 1
Oct 16 13:41:22 volovec openais[9813]: [TOTEM] Did not need to originate any messages in recovery.
Oct 16 13:41:22 volovec openais[9813]: [TOTEM] Sending initial ORF token
Oct 16 13:41:22 volovec openais[9813]: [CLM ] CLM CONFIGURATION CHANGE
Oct 16 13:41:22 volovec openais[9813]: [CLM ] New Configuration:
Oct 16 13:41:22 volovec openais[9813]: [CLM ] r(0) ip(10.250.40.14)
Oct 16 13:41:22 volovec openais[9813]: [CLM ] Members Left:
Oct 16 13:41:22 volovec openais[9813]: [CLM ] Members Joined:
Oct 16 13:41:22 volovec openais[9813]: [CLM ] CLM CONFIGURATION CHANGE
Oct 16 13:41:22 volovec openais[9813]: [CLM ] New Configuration:
Oct 16 13:41:22 volovec openais[9813]: [CLM ] r(0) ip(10.250.40.14)
Oct 16 13:41:22 volovec openais[9813]: [CLM ] Members Left:
Oct 16 13:41:22 volovec openais[9813]: [CLM ] Members Joined:
Oct 16 13:41:22 volovec openais[9813]: [SYNC ] This node is within the primary component and will provide service.
Oct 16 13:41:22 volovec openais[9813]: [TOTEM] entering OPERATIONAL state.
Oct 16 13:41:22 volovec openais[9813]: [CLM ] got nodejoin message 10.250.40.14
Oct 16 13:41:22 volovec openais[9813]: [CPG ] got joinlist message from node 1
Oct 16 13:41:22 volovec openais[9813]: [TOTEM] entering GATHER state from 9.



He is my cluster.conf

<?xml version="1.0"?>
<cluster alias="DEVIL" config_version="11" name="DEVIL">
<fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="60"/>
<clusternodes>
<clusternode name="volovec.onsemi.com" nodeid="1" votes="1">
<fence>
<method name="1">
<device modulename="" name="dracvolovec"/>
</method>
</fence>
</clusternode>
<clusternode name="baranec.onsemi.com" nodeid="2" votes="1">
<fence>
<method name="1">
<device modulename="" name="dracbaranec"/>
</method>
</fence>
</clusternode>
</clusternodes>
<cman expected_votes="2" two_node="0"/>
<fencedevices>
<fencedevice agent="fence_drac" ipaddr="10.250.40.13" login="root" name="dracvolovec" passwd="calvin"/>
<fencedevice agent="fence_drac" ipaddr="10.250.40.23" login="root" name="dracbaranec" passwd="calvin"/>
</fencedevices>
<rm>
<failoverdomains>
<failoverdomain name="prim_baranec">
<failoverdomainnode name="baranec.onsemi.com" priority="1"/>
</failoverdomain>
<failoverdomain name="prim_volovec">
<failoverdomainnode name="volovec.onsemi.com" priority="1"/>
</failoverdomain>
</failoverdomains>
<resources/>
<service autostart="1" domain="prim_baranec" name="web">
<script file="/etc/init.d/httpd" name="apache"/>
</service>
<service autostart="1" domain="prim_volovec" name="nfsserver">
<script file="/etc/init.d/nfs" name="nfs"/>
</service>
</rm>
<totem consensus="4800" join="120" token="25000" token_retransmits_before_loss_const="200"/>
<quorumd device="/dev/emcpowerc1" interval="2" min_score="1" tko="5" votes="1"/>
</cluster>


I use RHEL 5.2 which is up to date. I am trying to use the same configuration on both servers. If you have some ideas what to do please let me know.

thank you very much,

regards
Stefan
 
Old 10-20-2008, 02:05 AM   #2
fft9qh
LQ Newbie
 
Registered: Oct 2008
Posts: 12

Original Poster
Rep: Reputation: 0
Hi all,
I have update on this.

I found that issue is on side of node VOLOVEC. This node is not able to take back any node which is starting and want to rejoin the cluster.

I need to perform additional tests to invetigate is is HW issue or I have to reinstall the server.

Stefan
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Cman and DLM RPMS installtion failure!! Please help aix_tiger Linux - Enterprise 3 03-08-2007 05:52 AM
Split Command... How to Rejoin? Quantumstate Linux - Software 2 11-03-2006 02:49 PM
cannot rejoin after upgrade to Samba 3.0.2a kenji1903 Linux - Networking 1 04-23-2004 03:40 AM
RedHat AS 2.1 : Cluster restart detected. Rereading session ID. sbouhnik1 Red Hat 0 01-06-2004 09:01 AM
How to rejoin split files SharpyWarpy Linux - General 2 02-07-2003 02:42 PM

LinuxQuestions.org > Forums > Enterprise Linux Forums > Linux - Enterprise

All times are GMT -5. The time now is 05:12 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration