LinuxQuestions.org
Review your favorite Linux distribution.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Server
User Name
Password
Linux - Server This forum is for the discussion of Linux Software used in a server related context.

Notices

Reply
 
LinkBack Search this Thread
Old 02-07-2012, 02:18 PM   #1
rhodes187
LQ Newbie
 
Registered: Jul 2005
Posts: 8

Rep: Reputation: 0
Unhappy RHEL4 cluster


We have a three node RHEL 4 cluster that is set up solely to just share several GFS file systems. We recently had to reboot one of the three nodes and now I cannot get it to rejoin the cluster. It hangs for a long time on starting cman then eventually fails and moves on but this node cannot join the cluster. I see the following in logs:

Feb 7 15:05:13 lakeside kernel: CMAN: Waiting to join or form a Linux-cluster
Feb 7 15:05:13 lakeside ccsd[4801]: Connected to cluster infrastruture via: CMAN/SM Plugin v1.1.7.6
Feb 7 15:05:13 lakeside ccsd[4801]: Initial status:: Inquorate

cman_tool status one one of the working nodes shows (I removed the IP):
Protocol version: 5.0.1
Config version: 158
Cluster name: webfarm
Cluster ID: 13957
Cluster Member: Yes
Membership state: Cluster-Member
Nodes: 2
Expected_votes: 2
Total_votes: 2
Quorum: 2
Active subsystems: 64
Node name: ironstone
Node ID: 1
Node addresses: x.x.x.x

cman_tool status on node that won't join shows:
[root@lakeside log]# cman_tool status
Protocol version: 5.0.1
Config version: 158
Cluster name: webfarm
Cluster ID: 13957
Cluster Member: No
Membership state: Joining

cman_tool nodes on working node shows:
Node Votes Exp Sts Name
1 1 2 M ironstone
2 1 3 X lakeside
3 1 2 M dawson

Why does the broken node expect 3 votes? Is this causing the issue? Any way to have it expect 2 like the others?

We're retiring this cluster in the next month, but would really like to get it working.... Any way without rebooting other nodes to get things working again?
 
Old 02-07-2012, 04:17 PM   #2
kbp
Senior Member
 
Registered: Aug 2009
Posts: 3,048

Rep: Reputation: 471Reputation: 471Reputation: 471Reputation: 471Reputation: 471
I haven't had much experience with this but on lakeside I'd try 'cman_tool expected 2', then 'cman_tool join' ..
 
Old 02-07-2012, 04:48 PM   #3
rhodes187
LQ Newbie
 
Registered: Jul 2005
Posts: 8

Original Poster
Rep: Reputation: 0
Thanks for responding but it's not in the cluster so this doesn't seem to work:

[root@lakeside network-scripts]# cman_tool expected -e2
cman_tool: can't set expected votes: Node is not yet a cluster member

If I attempt to restart ccsd and do a cman_tool join I see the following:

Feb 7 17:45:53 lakeside ccsd[11932]: cluster.conf (cluster name = webfarm, version = 158) found.
Feb 7 17:45:53 lakeside ccsd[11932]: Remote copy of cluster.conf is from quorate node.
Feb 7 17:45:53 lakeside ccsd[11932]: Local version # : 158
Feb 7 17:45:53 lakeside ccsd[11932]: Remote version #: 158
Feb 7 17:45:53 lakeside kernel: CMAN: Waiting to join or form a Linux-cluster
Feb 7 17:45:53 lakeside ccsd[11932]: Connected to cluster infrastruture via: CMAN/SM Plugin v1.1.7.6
Feb 7 17:45:53 lakeside ccsd[11932]: Initial status:: Inquorate
Feb 7 17:45:55 lakeside kernel: CMAN: sending membership request

We do NOT have any iptables running and all nodes are on same network. Also I know that all /etc/cluster/cluster.conf files are in sync.
 
Old 02-07-2012, 08:37 PM   #4
kbp
Senior Member
 
Registered: Aug 2009
Posts: 3,048

Rep: Reputation: 471Reputation: 471Reputation: 471Reputation: 471Reputation: 471
Is there anything relevant in the logs on the working ones ?
 
Old 02-08-2012, 07:47 AM   #5
rhodes187
LQ Newbie
 
Registered: Jul 2005
Posts: 8

Original Poster
Rep: Reputation: 0
No, unfortunately the working nodes have zero information in their logs regarding this third node trying to rejoin the cluster.
 
Old 02-08-2012, 03:37 PM   #6
kbp
Senior Member
 
Registered: Aug 2009
Posts: 3,048

Rep: Reputation: 471Reputation: 471Reputation: 471Reputation: 471Reputation: 471
Have you ensured the network connectivity is present? .. I would have thought there'd be something in the logs when a host attempted to join the cluster. Maybe run tcpdump to make sure the packets are arriving...
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] Converting Windows node cluster to Linux Cent0S 5.5 cluster do I kickstart or image ? cdrolet Linux - Newbie 1 11-18-2011 10:44 AM
LXer: Linux Terminal Server Project Cluster Edition (LTSP-Cluster) LXer Syndicated Linux News 0 02-07-2010 03:50 AM
LXer: Cray to Resell Moab Cluster Suite from Cluster Resources LXer Syndicated Linux News 0 02-04-2008 06:10 PM
How to define an NFS share service using RHEL4 Cluster Suite 4 enigma75 Red Hat 1 07-15-2007 05:31 AM


All times are GMT -5. The time now is 08:41 PM.

Main Menu
 
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: @linuxquestions
Open Source Consulting | Domain Registration