LinuxQuestions.org
View the Most Wanted LQ Wiki articles.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices

Reply
 
LinkBack Search this Thread
Old 08-11-2008, 01:26 PM   #1
psteele555
LQ Newbie
 
Registered: Jun 2008
Distribution: RHEL 5.0
Posts: 24

Rep: Reputation: 15
Running groupd causes initially connected node of Red Hat cluster to disconnect RHEL5


For some time now I have been trying to set up a Red Hat cluster (to test GFS capabilities) on RHEL5. I've gotten to the point where I am following this document: http://sources.redhat.com/git/?p=clu...db3;hb=STABLE2 because the rest of the documentation that I have been able to find relates mostly to RHEL4.1.

I am following this procedure:

Quote:
79 Startup procedure
80 -----------------
81
82 To use the init script, run "service cman start" on all nodes. When getting a
83 cluster set up initially, it can be helpful to do all the steps manually as
84 follows.

85
86 Run these commands on each cluster node:
87
88 > mount -t configfs none /sys/kernel/config
89 > ccsd
90 > cman_tool join
91 > groupd
92 > fenced
93 > fence_tool join
94 > dlm_controld
95 > gfs_controld
96 > clvmd (only necessary if using clvm volumes for gfs)
97 > mkfs -t gfs -p lock_dlm -t <clustername>:<fsname> -j <#journals> <blockdev>
98 > mount -t gfs [-v] <blockdev> <mountpoint>
(emphasis added.)

Initially I had a two-node cluster that had 2 expected votes, 1 each. After the two nodes, node4008a4 and node4003a6, connected to each other, I ran groupd on both. They connected again, after a little while, but I got this message:

Quote:
openais[3136]: [MAIN ] Node node4008a4.localdomain not joined to
> cman because it has rejoined an inquorate cluster
cman_tool status on both machines showed them as connected, but with nodes disallowed. So I changed the votes for node4008a4 to 2 and the expected votes for the cluster to 3. I rebooted and tried again. This time, node4008a4 showed the following:

[QUOTE]Aug 11 13:15:08 node4008a4 openais[4332]: [TOTEM] Retransmit List: 15
Aug 11 13:15:15 node4008a4 last message repeated 49 times

For some time now I have been trying to set up a Red Hat cluster (to test GFS capabilities) on RHEL5. I've gotten to the point where I am following this document: http://sources.redhat.com/git/?p=clu...db3;hb=STABLE2 because the rest of the documentation that I have been able to find relates mostly to RHEL4.1.

I am following this procedure:

Quote:
79 Startup procedure
80 -----------------
81
82 To use the init script, run "service cman start" on all nodes. When getting a
83 cluster set up initially, it can be helpful to do all the steps manually as
84 follows.

85
86 Run these commands on each cluster node:
87
88 > mount -t configfs none /sys/kernel/config
89 > ccsd
90 > cman_tool join
91 > groupd
92 > fenced
93 > fence_tool join
94 > dlm_controld
95 > gfs_controld
96 > clvmd (only necessary if using clvm volumes for gfs)
97 > mkfs -t gfs -p lock_dlm -t <clustername>:<fsname> -j <#journals> <blockdev>
98 > mount -t gfs [-v] <blockdev> <mountpoint>
(emphasis added.)

Initially I had a two-node cluster that had 2 expected votes, 1 each. After the two nodes, node4008a4 and node4003a6, connected to each other, I ran groupd on both. They connected again, after a little while, but I got this message:

Quote:
openais[3136]: [MAIN ] Node node4008a4.localdomain not joined to
> cman because it has rejoined an inquorate cluster
cman_tool status on both machines showed them as connected, but with nodes disallowed. So I changed the votes for node4008a4 to 2 and the expected votes for the cluster to 3. I rebooted and tried again. This time, node4008a4 showed the following:

Quote:
Aug 11 13:15:08 node4008a4 openais[4332]: [TOTEM] Retransmit List: 15
Aug 11 13:15:15 node4008a4 last message repeated 49 times

Aug 11 13:15:25 node4008a4 openais[4332]: [TOTEM] Retransmit List: 15
Aug 11 13:15:25 node4008a4 openais[4332]: [TOTEM] FAILED TO RECEIVE
Aug 11 13:15:25 node4008a4 openais[4332]: [TOTEM] entering GATHER state from 6.
Aug 11 13:15:29 node4008a4 openais[4332]: [TOTEM] entering GATHER state from 0.
Aug 11 13:15:29 node4008a4 openais[4332]: [TOTEM] Creating commit token because I am the rep.
Aug 11 13:15:29 node4008a4 openais[4332]: [TOTEM] Saving state aru 15 high seq received 16
Aug 11 13:15:29 node4008a4 openais[4332]: [TOTEM] entering COMMIT state.
Aug 11 13:15:29 node4008a4 openais[4332]: [TOTEM] entering RECOVERY state.
Aug 11 13:15:29 node4008a4 openais[4332]: [TOTEM] position [0] member 10.125.8.4:
Aug 11 13:15:29 node4008a4 openais[4332]: [TOTEM] previous ring seq 8 rep 10.125.3.6
Aug 11 13:15:29 node4008a4 openais[4332]: [TOTEM] aru 15 high delivered 15 received flag 0
Aug 11 13:15:29 node4008a4 openais[4332]: [TOTEM] copying all old ring messages from 16-16.
Aug 11 13:15:29 node4008a4 openais[4332]: [TOTEM] Originated 0 messages in RECOVERY.
Aug 11 13:15:29 node4008a4 openais[4332]: [TOTEM] Originated for recovery:
Aug 11 13:15:29 node4008a4 openais[4332]: [TOTEM] Not Originated for recovery: 16
Aug 11 13:15:29 node4008a4 openais[4332]: [TOTEM] Storing new sequence id for ring c
Aug 11 13:15:29 node4008a4 openais[4332]: [TOTEM] Sending initial ORF token
Aug 11 13:15:29 node4008a4 openais[4332]: [CLM ] CLM CONFIGURATION CHANGE
Aug 11 13:15:29 node4008a4 openais[4332]: [CLM ] New Configuration:
Aug 11 13:15:29 node4008a4 openais[4332]: [CLM ] r(0) ip(10.125.8.4)
Aug 11 13:15:29 node4008a4 openais[4332]: [CLM ] Members Left:
Aug 11 13:15:29 node4008a4 openais[4332]: [CLM ] r(0) ip(10.125.3.6)
Aug 11 13:15:29 node4008a4 openais[4332]: [CLM ] Members Joined:
Aug 11 13:15:29 node4008a4 openais[4332]: [SYNC ] This node is within the primary component and will provide service.
Aug 11 13:15:29 node4008a4 openais[4332]: [CLM ] CLM CONFIGURATION CHANGE
Aug 11 13:15:29 node4008a4 openais[4332]: [CLM ] New Configuration:
Aug 11 13:15:29 node4008a4 openais[4332]: [CLM ] r(0) ip(10.125.8.4)
Aug 11 13:15:29 node4008a4 openais[4332]: [CLM ] Members Left:
Aug 11 13:15:29 node4008a4 openais[4332]: [CLM ] Members Joined:
Aug 11 13:15:29 node4008a4 openais[4332]: [SYNC ] This node is within the primary component and will provide service.
Aug 11 13:15:29 node4008a4 openais[4332]: [TOTEM] entering OPERATIONAL state.
Aug 11 13:15:30 node4008a4 openais[4332]: [CLM ] got nodejoin message 10.125.8.4
Aug 11 13:15:30 node4008a4 openais[4332]: [CPG ] got joinlist message from node 1
Aug 11 13:19:45 node4008a4 openais[4332]: [TOTEM] entering GATHER state from 11.
Aug 11 13:19:45 node4008a4 openais[4332]: [TOTEM] Saving state aru c high seq received c
Aug 11 13:19:45 node4008a4 openais[4332]: [TOTEM] entering COMMIT state.
Aug 11 13:19:45 node4008a4 openais[4332]: [TOTEM] entering RECOVERY state.
Aug 11 13:19:45 node4008a4 openais[4332]: [TOTEM] position [0] member 10.125.3.6:
Aug 11 13:19:45 node4008a4 openais[4332]: [TOTEM] previous ring seq 12 rep 10.125.3.6
Aug 11 13:19:45 node4008a4 openais[4332]: [TOTEM] aru d high delivered d received flag 0
Aug 11 13:19:45 node4008a4 openais[4332]: [TOTEM] position [1] member 10.125.8.4:
Aug 11 13:19:45 node4008a4 openais[4332]: [TOTEM] previous ring seq 12 rep 10.125.8.4
Aug 11 13:19:45 node4008a4 openais[4332]: [TOTEM] aru c high delivered c received flag 0
Aug 11 13:19:45 node4008a4 openais[4332]: [TOTEM] Did not need to originate any messages in recovery.
Aug 11 13:19:45 node4008a4 openais[4332]: [TOTEM] Storing new sequence id for ring 10
Aug 11 13:19:45 node4008a4 openais[4332]: [CLM ] CLM CONFIGURATION CHANGE
Aug 11 13:19:45 node4008a4 openais[4332]: [CLM ] New Configuration:
Aug 11 13:19:45 node4008a4 openais[4332]: [CLM ] r(0) ip(10.125.8.4)
Aug 11 13:19:45 node4008a4 openais[4332]: [CLM ] Members Left:
Aug 11 13:19:45 node4008a4 openais[4332]: [CLM ] Members Joined:
Aug 11 13:19:45 node4008a4 openais[4332]: [SYNC ] This node is within the primary component and will provide service.
Aug 11 13:19:45 node4008a4 openais[4332]: [CLM ] CLM CONFIGURATION CHANGE
Aug 11 13:19:45 node4008a4 openais[4332]: [CLM ] New Configuration:
Aug 11 13:19:45 node4008a4 openais[4332]: [CLM ] r(0) ip(10.125.3.6)
Aug 11 13:19:45 node4008a4 openais[4332]: [CLM ] r(0) ip(10.125.8.4)
Aug 11 13:19:45 node4008a4 openais[4332]: [CLM ] Members Left:
Aug 11 13:19:45 node4008a4 openais[4332]: [CLM ] Members Joined:
Aug 11 13:19:45 node4008a4 openais[4332]: [CLM ] r(0) ip(10.125.3.6)
Aug 11 13:19:45 node4008a4 openais[4332]: [SYNC ] This node is within the primary component and will provide service.
Aug 11 13:19:45 node4008a4 openais[4332]: [TOTEM] entering OPERATIONAL state.
Aug 11 13:19:45 node4008a4 openais[4332]: [MAIN ] Killing node node4003a6.localdomain because it has rejoined the cluster without cman_tool join
and node4003a6 showed the following:

Quote:
Aug 10 13:38:15 node4003a6 ccsd[3953]: Cluster is not quorate. Refusing connection.
Aug 10 13:38:15 node4003a6 ccsd[3953]: Error while processing connect: Connection refused
Aug 10 13:38:16 node4003a6 openais[3983]: [TOTEM] entering GATHER state from 9.
Aug 10 13:38:16 node4003a6 openais[3983]: [TOTEM] Creating commit token because I am the rep.
Aug 10 13:38:16 node4003a6 openais[3983]: [TOTEM] Saving state aru d high seq received d
Aug 10 13:38:16 node4003a6 openais[3983]: [TOTEM] entering COMMIT state.
Aug 10 13:38:16 node4003a6 openais[3983]: [TOTEM] entering RECOVERY state.
Aug 10 13:38:16 node4003a6 openais[3983]: [TOTEM] position [0] member 10.125.3.6:
Aug 10 13:38:16 node4003a6 openais[3983]: [TOTEM] previous ring seq 12 rep 10.125.3.6
Aug 10 13:38:16 node4003a6 openais[3983]: [TOTEM] aru d high delivered d received flag 0
Aug 10 13:38:16 node4003a6 openais[3983]: [TOTEM] position [1] member 10.125.8.4:
Aug 10 13:38:16 node4003a6 openais[3983]: [TOTEM] previous ring seq 12 rep 10.125.8.4
Aug 10 13:38:16 node4003a6 openais[3983]: [TOTEM] aru c high delivered c received flag 0
Aug 10 13:38:16 node4003a6 openais[3983]: [TOTEM] Did not need to originate any messages in recovery.
Aug 10 13:38:16 node4003a6 openais[3983]: [TOTEM] Storing new sequence id for ring 10
Aug 10 13:38:16 node4003a6 openais[3983]: [TOTEM] Sending initial ORF token
Aug 10 13:38:16 node4003a6 openais[3983]: [CLM ] CLM CONFIGURATION CHANGE
Aug 10 13:38:16 node4003a6 openais[3983]: [CLM ] New Configuration:
Aug 10 13:38:16 node4003a6 openais[3983]: [CLM ] r(0) ip(10.125.3.6)
Aug 10 13:38:16 node4003a6 openais[3983]: [CLM ] Members Left:
Aug 10 13:38:16 node4003a6 openais[3983]: [CLM ] Members Joined:
Aug 10 13:38:16 node4003a6 openais[3983]: [SYNC ] This node is within the primary component and will provide service.
Aug 10 13:38:16 node4003a6 openais[3983]: [CLM ] CLM CONFIGURATION CHANGE
Aug 10 13:38:16 node4003a6 openais[3983]: [CLM ] New Configuration:
Aug 10 13:38:16 node4003a6 openais[3983]: [CLM ] r(0) ip(10.125.3.6)
Aug 10 13:38:16 node4003a6 openais[3983]: [CLM ] r(0) ip(10.125.8.4)
Aug 10 13:38:16 node4003a6 openais[3983]: [CLM ] Members Left:
Aug 10 13:38:16 node4003a6 openais[3983]: [CLM ] Members Joined:
Aug 10 13:38:16 node4003a6 openais[3983]: [CLM ] r(0) ip(10.125.8.4)
Aug 10 13:38:16 node4003a6 openais[3983]: [SYNC ] This node is within the primary component and will provide service.
Aug 10 13:38:16 node4003a6 openais[3983]: [TOTEM] entering OPERATIONAL state.
Aug 10 13:38:16 node4003a6 openais[3983]: [MAIN ] Node node4008a4.localdomain not joined to cman because it has rejoined an inquorate cluster
Aug 10 13:38:16 node4003a6 openais[3983]: [CMAN ] cman killed by node 1 for reason 3
Aug 10 13:38:21 node4003a6 ccsd[3953]: Unable to connect to cluster infrastructure after 30 seconds.
My question is, why does groupd even cause node4003a6 to disconnect? groupd is not mentioned in any of the docs I have seen except for this one (although, again, most of the other docs were foro RHEL4.1 or earlier). I'm at a loss as to how to combat this problem, other than what I already tried. Any ideas?
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Set up Red Hat Linux Fedora Core 4 fine, initially but. . . Henry777 Linux - Newbie 5 07-08-2008 02:47 PM
mysqld node of mysql cluster system not connecting to management node coal-fire-ice Linux - Server 0 05-08-2008 07:44 PM
WebSphere on a Red Hat Cluster valen_tino Red Hat 0 05-07-2008 10:45 PM
Common Pitfalls of Running MySQL 5.0 Cluster on Red Hat ES 4? Reptile Linux - Server 2 11-28-2006 12:22 PM
LXer: Red Hat to Buy JBoss for About $350 Million Initially LXer Syndicated Linux News 0 04-10-2006 11:03 AM


All times are GMT -5. The time now is 01:23 PM.

Main Menu
 
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: @linuxquestions
Open Source Consulting | Domain Registration