LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Server
User Name
Password
Linux - Server This forum is for the discussion of Linux Software used in a server related context.

Notices


Reply
  Search this Thread
Old 05-21-2013, 02:11 AM   #1
salman108
LQ Newbie
 
Registered: Dec 2012
Posts: 14

Rep: Reputation: Disabled
2 Node cluster will not start.


Hello,

I am trying to setup a 2 node cluster, using RHEL 5.5.
Quote:
[root@IBRMAPPPSV02 etc]# uname -a
Linux IBRMAPPPSV02 2.6.18-194.el5 #1 SMP Tue Mar 16 21:52:39 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux
[root@IBRMAPPPSV02 etc]#
I have followed the following steps

1. Host files done.
2. Quorum disk, allocated LUN, did the qdisk -c /dev/sdd1 -l brmquorum
3. Installed Luci/ricci. Luci is running on one of the cluster nodes.
4. Made the cluster.conf file as follows.
PHP Code:
<?xml version="1.0"?>
<cluster alias="BRMCLUSTER" config_version="10" name="BRMCLUSTER">
        <fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3"/>
        <clusternodes>
                <clusternode name="IBRMAPPPSV02" nodeid="1" votes="1">
                        <fence>
                                <method name="1">
                                        <device name="Manual02" nodename="IBRMAPPPSV02"/>
                                </method>
                        </fence>
                </clusternode>
                <clusternode name="IBRMAPPPSV01" nodeid="2" votes="1">
                        <fence>
                                <method name="1">
                                        <device name="Manual01" nodename="IBRMAPPPSV01"/>
                                </method>
                        </fence>
                </clusternode>
        </clusternodes>
        <cman expected_votes="3"/>
        <fencedevices>
                <fencedevice agent="fence_manual" name="Manual01"/>
                <fencedevice agent="fence_manual" name="Manual02"/>
        </fencedevices>
        <rm>
                <failoverdomains>
                        <failoverdomain name="BRMFAIL" nofailback="1" ordered="1" restricted="1">
                                <failoverdomainnode name="IBRMAPPPSV02" priority="2"/>
                                <failoverdomainnode name="IBRMAPPPSV01" priority="1"/>
                        </failoverdomain>
                </failoverdomains>
                <resources>
                        <ip address="10.10.192.61" monitor_link="1"/>
                </resources>
                <service autostart="1" domain="BRMFAIL" exclusive="1" name="BRMSERVICE" recovery="relocate"/>
        </rm>
        <quorumd device="/dev/sdd1" interval="1" min_score="1" tko="3" votes="1">
                <heuristic interval="1" program="/usr/share/cluster/check_eth_link.sh bond0" score="1"/>
        </quorumd>
</cluster>
Now, I have some problems.

on first node when I do clustat i get this
Quote:
[root@IBRMAPPPSV01 ~]# clustat
Cluster Status for BRMCLUSTER @ Tue May 21 09:47:04 2013
Member Status: Quorate

Member Name ID Status
------ ---- ---- ------
IBRMAPPPSV02 1 Offline
IBRMAPPPSV01 2 Online, Local
/dev/sdd1 0 Online, Quorum Disk
but on the other node when I do clustat I get this
Quote:
[root@IBRMAPPPSV02 ~]# clustat
Cluster Status for BRMCLUSTER @ Tue May 21 10:06:30 2013
Member Status: Inquorate

Member Name ID Status
------ ---- ---- ------
IBRMAPPPSV02 1 Online, Local
IBRMAPPPSV01 2 Offline
There are no problem logs on the first node, however on the second node, the continuous log is

Quote:
May 21 10:07:16 IBRMAPPPSV02 ccsd[14668]: Cluster is not quorate. Refusing connection.
May 21 10:07:16 IBRMAPPPSV02 ccsd[14668]: Error while processing connect: Connection refused
May 21 10:07:17 IBRMAPPPSV02 ccsd[14668]: Cluster is not quorate. Refusing connection.
May 21 10:07:17 IBRMAPPPSV02 ccsd[14668]: Error while processing connect: Connection refused
May 21 10:07:17 IBRMAPPPSV02 ccsd[14668]: Cluster is not quorate. Refusing connection.
On the problem node
When I try to start service start cman, it hangs up on starting fenced. This will remain hanged and won't allow the node to be turned off not will it ever start.

When I try to do service clvmd start I get this on the problem node.

Quote:
May 21 09:53:27 IBRMAPPPSV02 kernel: dlm: no local IP address has been set
May 21 09:53:27 IBRMAPPPSV02 kernel: dlm: cannot start dlm lowcomms -107
May 21 09:53:27 IBRMAPPPSV02 clvmd: Unable to create lockspace for CLVM: Transport endpoint is not connected

Can someone please help me, and point out where I have gone wrong ?


Lastly,
Quote:
openais[29491]: [TOTEM] position [0] member 10.10.192.45


Why is openais working with ifcfg-eth2 ? whereas it is an unused interface ?


best regards

Last edited by salman108; 05-21-2013 at 03:07 AM.
 
Old 05-22-2013, 04:21 AM   #2
salman108
LQ Newbie
 
Registered: Dec 2012
Posts: 14

Original Poster
Rep: Reputation: Disabled
Hello,

I figured out the problems with the rest of the setup, I have another problem now.

I am trying to load two volume groups, like so

PHP Code:
 <resources>
                        <
ip address="10.10.192.61" monitor_link="1"/>
                        <
lvm lv_name="oraclelv" name="oracle" vg_name="oraclevg"/>
                        <
lvm lv_name="LogVol_opt" name="OptLvm" vg_name="VolGroup00"/>
                </
resources>
                <
service autostart="1" domain="FOD_1" exclusive="1" name="IPService" recovery="relocate">
                        <
ip ref="10.10.192.61">
                                <
lvm ref="OptLvm"/>
                                <
lvm ref="oracle"/>
                        </
ip>
                </
service
and the output of vgdisplay is like so

PHP Code:
[root@IBRMAPPPSV02 ~]# vgdisplay
  
Incorrect metadata area header checksum
  
--- Volume group ---
  
VG Name               oraclevg
  System ID             
  Format                lvm2
  Metadata Areas        1
  Metadata Sequence No  9
  VG Access             read
/write
  VG Status             resizable
  MAX LV                0
  Cur LV                1
  Open LV               0
  Max PV                0
  Cur PV                1
  Act PV                1
  VG Size               600.00 GB
  PE Size               4.00 MB
  Total PE              153599
  Alloc PE 
Size       153344 599.00 GB
  Free  PE 
Size       255 1020.00 MB
  VG UUID               ca67ia
-K1ic-KPAe-xiZx-nQRp-R2xe-5bb7Td
   
  
--- Volume group ---
  
VG Name               VolGroup00
  System ID             
  Format                lvm2
  Metadata Areas        1
  Metadata Sequence No  17
  VG Access             read
/write
  VG Status             resizable
  MAX LV                0
  Cur LV                1
  Open LV               0
  Max PV                0
  Cur PV                1
  Act PV                1
  VG Size               95.84 GB
  PE Size               32.00 MB
  Total PE              3067
  Alloc PE 
Size       3067 95.84 GB
  Free  PE 
Size       0 0   
  VG UUID               sFWZH5
-uJNd-yq9z-XriI-9L74-Q64h-0lkwGc 
and the output of lvdisplay is like so
PHP Code:
[root@IBRMAPPPSV02 ~]# lvdisplay
  
Incorrect metadata area header checksum
  
--- Logical volume ---
  
LV Name                /dev/oraclevg/oraclelv
  VG Name                oraclevg
  LV UUID                J9gXhQ
-vXXX-lX9i-vQnf-3Yw3-acja-9MsGsg
  LV Write Access        read
/write
  LV Status              NOT available
  LV Size                599.00 GB
  Current LE             153344
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
   
  
--- Logical volume ---
  
LV Name                /dev/VolGroup00/LogVol_opt
  VG Name                VolGroup00
  LV UUID                dZJn88
-tU2y-Gf0S-UCbs-jmxu-gWZ1-GRCG66
  LV Write Access        read
/write
  LV Status              NOT available
  LV Size                95.84 GB
  Current LE             3067
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto 
However I am getting the following error when I start rgmanager.

I understand some configuration needs to be done on the lvm.conf

PHP Code:
May 22 12:15:57 IBRMAPPPSV02 kernelend_requestI/O errordev sdcsector 89
May 22 12
:15:57 IBRMAPPPSV02 clurgmgrd: [13663]: <errUnable to delete tag from oraclevg/oraclelv 
May 22 12
:15:57 IBRMAPPPSV02 clurgmgrd: [13663]: <errFailed to stop oraclevg/oraclelv 
May 22 12
:15:57 IBRMAPPPSV02 clurgmgrd: [13663]: <errFailed to stop oraclevg/oraclelv 
May 22 12
:15:57 IBRMAPPPSV02 clurgmgrd[13663]: <noticestop on lvm "oracle" returned 1 (generic error
May 22 12:15:57 IBRMAPPPSV02 clurgmgrd: [13663]: <errHA LVM:  Improper setup detected 
May 22 12
:15:57 IBRMAPPPSV02 clurgmgrd: [13663]: <err> - "volume_list" not specified in lvm.conf
May 22 12:15:57 IBRMAPPPSV02 clurgmgrd: [13663]: <errWARNINGAn improper setup can cause data corruption
May 22 12:15:57 IBRMAPPPSV02 clurgmgrd: [13663]: <noticeDeactivating VolGroup00/LogVol_opt 
May 22 12
:15:57 IBRMAPPPSV02 clurgmgrd: [13663]: <noticeMaking resilient lvchange -an VolGroup00/LogVol_opt 
May 22 12
:15:57 IBRMAPPPSV02 clurgmgrd: [13663]: <noticeResilient commandlvchange -an VolGroup00/LogVol_opt --config devices{filter=["a|/dev/sda8|","a|/dev/sdc1|","a|/dev/sdd1|","r|.*|"]} 
May 22 12:15:58 IBRMAPPPSV02 clurgmgrd: [13663]: <noticeRemoving ownership tag (IBRMAPPPSV02from VolGroup00/LogVol_opt 
May 22 12
:15:58 IBRMAPPPSV02 avahi-daemon[3569]: Withdrawing address record for 10.10.192.61 on bond0.
May 22 12:16:08 IBRMAPPPSV02 clurgmgrd[13663]: <crit#12: RG service:IPService failed to stop; intervention required 
May 22 12:16:08 IBRMAPPPSV02 clurgmgrd[13663]: <noticeService service:IPService is failed 
May 22 12
:16:08 IBRMAPPPSV02 clurgmgrd[13663]: <crit#13: Service service:IPService failed to stop cleanly 

Can someone help me with this error ?
 
Old 05-24-2013, 03:38 AM   #3
bloodstreetboy
Member
 
Registered: May 2012
Posts: 201
Blog Entries: 3

Rep: Reputation: 37
Maybe your service doesn't switch because this happened
Quote:
#13: Service service:IPService failed to stop cleanly
For debug your service stop, you can use rg_test test /etc/cluster/cluster.conf stop service <NAME_OF_SERVICE>

It would be more easy if you could show your cluster.conf

Read this once and try again
https://access.redhat.com/site/docum...rrorstate.html
 
Old 05-25-2013, 03:35 AM   #4
salman108
LQ Newbie
 
Registered: Dec 2012
Posts: 14

Original Poster
Rep: Reputation: Disabled
Hello,
Thanks for the reply, I am done with that part and the servers seem to behave themselves now.

I am now trying to load a SAN partition in an LVM as a resource. any pointers on that ?
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
mysqld node of mysql cluster system not connecting to management node coal-fire-ice Linux - Server 1 07-27-2015 08:33 AM
Redhat cluster service getting fail to start on second node ravindert Linux - Software 1 05-04-2013 10:31 AM
[SOLVED] KSH script behaving differently on an HACMP cluster node (prod) & a single node (UAT) mufy Programming 5 01-03-2011 02:08 AM
Two node cluster, start CMAN fence the other node DevinXu Linux - Enterprise 1 06-21-2010 12:37 PM
Home made cluster node fails to start sometimes on account of nfs Pier Linux - Server 2 03-24-2010 05:13 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Server

All times are GMT -5. The time now is 09:02 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration