2 Node cluster will not start.

salman108 · 05-21-2013, 02:11 AM

Hello,

I am trying to setup a 2 node cluster, using RHEL 5.5.

Quote:

[root@IBRMAPPPSV02 etc]# uname -a
Linux IBRMAPPPSV02 2.6.18-194.el5 #1 SMP Tue Mar 16 21:52:39 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux
[root@IBRMAPPPSV02 etc]#

I have followed the following steps

1. Host files done.
2. Quorum disk, allocated LUN, did the qdisk -c /dev/sdd1 -l brmquorum
3. Installed Luci/ricci. Luci is running on one of the cluster nodes.
4. Made the cluster.conf file as follows.

PHP Code:



<?xml version="1.0"?>
<cluster alias="BRMCLUSTER" config_version="10" name="BRMCLUSTER">
        <fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3"/>
        <clusternodes>
                <clusternode name="IBRMAPPPSV02" nodeid="1" votes="1">
                        <fence>
                                <method name="1">
                                        <device name="Manual02" nodename="IBRMAPPPSV02"/>
                                </method>
                        </fence>
                </clusternode>
                <clusternode name="IBRMAPPPSV01" nodeid="2" votes="1">
                        <fence>
                                <method name="1">
                                        <device name="Manual01" nodename="IBRMAPPPSV01"/>
                                </method>
                        </fence>
                </clusternode>
        </clusternodes>
        <cman expected_votes="3"/>
        <fencedevices>
                <fencedevice agent="fence_manual" name="Manual01"/>
                <fencedevice agent="fence_manual" name="Manual02"/>
        </fencedevices>
        <rm>
                <failoverdomains>
                        <failoverdomain name="BRMFAIL" nofailback="1" ordered="1" restricted="1">
                                <failoverdomainnode name="IBRMAPPPSV02" priority="2"/>
                                <failoverdomainnode name="IBRMAPPPSV01" priority="1"/>
                        </failoverdomain>
                </failoverdomains>
                <resources>
                        <ip address="10.10.192.61" monitor_link="1"/>
                </resources>
                <service autostart="1" domain="BRMFAIL" exclusive="1" name="BRMSERVICE" recovery="relocate"/>
        </rm>
        <quorumd device="/dev/sdd1" interval="1" min_score="1" tko="3" votes="1">
                <heuristic interval="1" program="/usr/share/cluster/check_eth_link.sh bond0" score="1"/>
        </quorumd>
</cluster>

Now, I have some problems.

on first node when I do clustat i get this

Quote:

[root@IBRMAPPPSV01 ~]# clustat
Cluster Status for BRMCLUSTER @ Tue May 21 09:47:04 2013
Member Status: Quorate

Member Name ID Status
------ ---- ---- ------
IBRMAPPPSV02 1 Offline
IBRMAPPPSV01 2 Online, Local
/dev/sdd1 0 Online, Quorum Disk

but on the other node when I do clustat I get this

Quote:

[root@IBRMAPPPSV02 ~]# clustat
Cluster Status for BRMCLUSTER @ Tue May 21 10:06:30 2013
Member Status: Inquorate

Member Name ID Status
------ ---- ---- ------
IBRMAPPPSV02 1 Online, Local
IBRMAPPPSV01 2 Offline

There are no problem logs on the first node, however on the second node, the continuous log is

Quote:

May 21 10:07:16 IBRMAPPPSV02 ccsd[14668]: Cluster is not quorate. Refusing connection.
May 21 10:07:16 IBRMAPPPSV02 ccsd[14668]: Error while processing connect: Connection refused
May 21 10:07:17 IBRMAPPPSV02 ccsd[14668]: Cluster is not quorate. Refusing connection.
May 21 10:07:17 IBRMAPPPSV02 ccsd[14668]: Error while processing connect: Connection refused
May 21 10:07:17 IBRMAPPPSV02 ccsd[14668]: Cluster is not quorate. Refusing connection.

On the problem node
When I try to start service start cman, it hangs up on starting fenced. This will remain hanged and won't allow the node to be turned off not will it ever start.

When I try to do service clvmd start I get this on the problem node.

Quote:

May 21 09:53:27 IBRMAPPPSV02 kernel: dlm: no local IP address has been set
May 21 09:53:27 IBRMAPPPSV02 kernel: dlm: cannot start dlm lowcomms -107
May 21 09:53:27 IBRMAPPPSV02 clvmd: Unable to create lockspace for CLVM: Transport endpoint is not connected

Can someone please help me, and point out where I have gone wrong ?

Lastly,

Quote:

openais[29491]: [TOTEM] position [0] member 10.10.192.45

Why is openais working with ifcfg-eth2 ? whereas it is an unused interface ?

best regards

salman108 · 05-22-2013, 04:21 AM

Hello,

I figured out the problems with the rest of the setup, I have another problem now.

I am trying to load two volume groups, like so

PHP Code:



 <resources>
                        <ip address="10.10.192.61" monitor_link="1"/>
                        <lvm lv_name="oraclelv" name="oracle" vg_name="oraclevg"/>
                        <lvm lv_name="LogVol_opt" name="OptLvm" vg_name="VolGroup00"/>
                </resources>
                <service autostart="1" domain="FOD_1" exclusive="1" name="IPService" recovery="relocate">
                        <ip ref="10.10.192.61">
                                <lvm ref="OptLvm"/>
                                <lvm ref="oracle"/>
                        </ip>
                </service>

and the output of vgdisplay is like so

PHP Code:



[root@IBRMAPPPSV02 ~]# vgdisplay
  Incorrect metadata area header checksum
  --- Volume group ---
  VG Name               oraclevg
  System ID             
  Format                lvm2
  Metadata Areas        1
  Metadata Sequence No  9
  VG Access             read/write
  VG Status             resizable
  MAX LV                0
  Cur LV                1
  Open LV               0
  Max PV                0
  Cur PV                1
  Act PV                1
  VG Size               600.00 GB
  PE Size               4.00 MB
  Total PE              153599
  Alloc PE / Size       153344 / 599.00 GB
  Free  PE / Size       255 / 1020.00 MB
  VG UUID               ca67ia-K1ic-KPAe-xiZx-nQRp-R2xe-5bb7Td
   
  --- Volume group ---
  VG Name               VolGroup00
  System ID             
  Format                lvm2
  Metadata Areas        1
  Metadata Sequence No  17
  VG Access             read/write
  VG Status             resizable
  MAX LV                0
  Cur LV                1
  Open LV               0
  Max PV                0
  Cur PV                1
  Act PV                1
  VG Size               95.84 GB
  PE Size               32.00 MB
  Total PE              3067
  Alloc PE / Size       3067 / 95.84 GB
  Free  PE / Size       0 / 0   
  VG UUID               sFWZH5-uJNd-yq9z-XriI-9L74-Q64h-0lkwGc

and the output of lvdisplay is like so

PHP Code:



[root@IBRMAPPPSV02 ~]# lvdisplay
  Incorrect metadata area header checksum
  --- Logical volume ---
  LV Name                /dev/oraclevg/oraclelv
  VG Name                oraclevg
  LV UUID                J9gXhQ-vXXX-lX9i-vQnf-3Yw3-acja-9MsGsg
  LV Write Access        read/write
  LV Status              NOT available
  LV Size                599.00 GB
  Current LE             153344
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
   
  --- Logical volume ---
  LV Name                /dev/VolGroup00/LogVol_opt
  VG Name                VolGroup00
  LV UUID                dZJn88-tU2y-Gf0S-UCbs-jmxu-gWZ1-GRCG66
  LV Write Access        read/write
  LV Status              NOT available
  LV Size                95.84 GB
  Current LE             3067
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto

However I am getting the following error when I start rgmanager.

I understand some configuration needs to be done on the lvm.conf

PHP Code:



May 22 12:15:57 IBRMAPPPSV02 kernel: end_request: I/O error, dev sdc, sector 89
May 22 12:15:57 IBRMAPPPSV02 clurgmgrd: [13663]: <err> Unable to delete tag from oraclevg/oraclelv 
May 22 12:15:57 IBRMAPPPSV02 clurgmgrd: [13663]: <err> Failed to stop oraclevg/oraclelv 
May 22 12:15:57 IBRMAPPPSV02 clurgmgrd: [13663]: <err> Failed to stop oraclevg/oraclelv 
May 22 12:15:57 IBRMAPPPSV02 clurgmgrd[13663]: <notice> stop on lvm "oracle" returned 1 (generic error) 
May 22 12:15:57 IBRMAPPPSV02 clurgmgrd: [13663]: <err> HA LVM:  Improper setup detected 
May 22 12:15:57 IBRMAPPPSV02 clurgmgrd: [13663]: <err> - "volume_list" not specified in lvm.conf. 
May 22 12:15:57 IBRMAPPPSV02 clurgmgrd: [13663]: <err> WARNING: An improper setup can cause data corruption! 
May 22 12:15:57 IBRMAPPPSV02 clurgmgrd: [13663]: <notice> Deactivating VolGroup00/LogVol_opt 
May 22 12:15:57 IBRMAPPPSV02 clurgmgrd: [13663]: <notice> Making resilient : lvchange -an VolGroup00/LogVol_opt 
May 22 12:15:57 IBRMAPPPSV02 clurgmgrd: [13663]: <notice> Resilient command: lvchange -an VolGroup00/LogVol_opt --config devices{filter=["a|/dev/sda8|","a|/dev/sdc1|","a|/dev/sdd1|","r|.*|"]} 
May 22 12:15:58 IBRMAPPPSV02 clurgmgrd: [13663]: <notice> Removing ownership tag (IBRMAPPPSV02) from VolGroup00/LogVol_opt 
May 22 12:15:58 IBRMAPPPSV02 avahi-daemon[3569]: Withdrawing address record for 10.10.192.61 on bond0.
May 22 12:16:08 IBRMAPPPSV02 clurgmgrd[13663]: <crit> #12: RG service:IPService failed to stop; intervention required 
May 22 12:16:08 IBRMAPPPSV02 clurgmgrd[13663]: <notice> Service service:IPService is failed 
May 22 12:16:08 IBRMAPPPSV02 clurgmgrd[13663]: <crit> #13: Service service:IPService failed to stop cleanly

Can someone help me with this error ?

bloodstreetboy · 05-24-2013, 03:38 AM

Maybe your service doesn't switch because this happened

Quote:

#13: Service service:IPService failed to stop cleanly

For debug your service stop, you can use rg_test test /etc/cluster/cluster.conf stop service <NAME_OF_SERVICE>

It would be more easy if you could show your cluster.conf

Read this once and try again
https://access.redhat.com/site/docum...rrorstate.html

salman108 · 05-25-2013, 03:35 AM

Hello,
Thanks for the reply, I am done with that part and the servers seem to behave themselves now.

I am now trying to load a SAN partition in an LVM as a resource. any pointers on that ?