LinuxQuestions.org
Latest LQ Deal: Latest LQ Deals
Go Back   LinuxQuestions.org > Forums > Enterprise Linux Forums > Linux - Enterprise
User Name
Password
Linux - Enterprise This forum is for all items relating to using Linux in the Enterprise.

Notices


Reply
  Search this Thread
Old 06-14-2012, 12:01 PM   #1
kirukan
Senior Member
 
Registered: Jun 2008
Location: Eelam
Distribution: Redhat, Solaris, Suse
Posts: 1,278

Rep: Reputation: 148Reputation: 148
fenching in redhat cluster


Is fencing configuration is so important in 2 node clustering? for a testing, unplugged and plugged the network cable for a node and then i receive the following error like "fencing devices is failed to connect" can anybody help me please?
when a node power-off and then power-on is it automatically join to the cluster?
 
Old 06-14-2012, 05:17 PM   #2
kbscores
Member
 
Registered: Oct 2011
Location: USA
Distribution: Red Hat
Posts: 259
Blog Entries: 9

Rep: Reputation: 32
Need more information. For example if you are using this with a database and the user with which runs the database is not available on the other server it will fail. For example if the inactivity setting is set to 10 days then every 10 days if no one logs into the account directly the account will lock and the cluster will not be able to get the data because the account is locked. I mean there are several reasons why it would fail. That would be one of the first things I'd check though.
 
Old 06-14-2012, 11:33 PM   #3
kirukan
Senior Member
 
Registered: Jun 2008
Location: Eelam
Distribution: Redhat, Solaris, Suse
Posts: 1,278

Original Poster
Rep: Reputation: 148Reputation: 148
For testing i just unplugged the nodeB network cable and then i got this error. Now its not going to connect with cluster
Quote:
Jun 15 11:58:36 nodeB fenced[28160]: telling cman to remove nodeid 1 from cluster
Jun 15 11:58:46 nodeB corosync[28095]: [TOTEM ] A processor failed, forming new configuration.
Jun 15 11:58:48 nodeB corosync[28095]: [QUORUM] Members[1]: 2
Jun 15 11:58:48 nodeB corosync[28095]: [TOTEM ] A processor joined or left the membership and a new membership was formed.
Jun 15 11:58:48 nodeB corosync[28095]: [CPG ] downlist received left_list: 1
Jun 15 11:58:48 nodeB corosync[28095]: [CPG ] chosen downlist from node r(0) ip(10.2.2.30)
Jun 15 11:58:48 nodeB corosync[28095]: [MAIN ] Completed service synchronization, ready to provide service.
Jun 15 11:58:48 nodeB kernel: dlm: closing connection to node 1
Jun 15 11:58:48 nodeB rgmanager[28333]: State change: nodeA DOWN
Jun 15 11:59:36 nodeB corosync[28095]: [TOTEM ] A processor joined or left the membership and a new membership was formed.
Jun 15 11:59:36 nodeB corosync[28095]: [QUORUM] Members[2]: 1 2
Jun 15 11:59:36 nodeB corosync[28095]: [QUORUM] Members[2]: 1 2
Jun 15 11:59:36 nodeB corosync[28095]: [CPG ] downlist received left_list: 0
Jun 15 11:59:36 nodeB corosync[28095]: [CPG ] downlist received left_list: 0
Jun 15 11:59:36 nodeB corosync[28095]: [CPG ] chosen downlist from node r(0) ip(10.2.2.20)
Jun 15 11:59:36 nodeB corosync[28095]: [MAIN ] Completed service synchronization, ready to provide service.
Jun 15 11:59:50 nodeB corosync[28095]: [TOTEM ] A processor failed, forming new configuration.
Jun 15 11:59:52 nodeB corosync[28095]: [QUORUM] Members[1]: 2
Jun 15 11:59:52 nodeB corosync[28095]: [TOTEM ] A processor joined or left the membership and a new membership was formed.
Jun 15 11:59:52 nodeB corosync[28095]: [CPG ] downlist received left_list: 1
Jun 15 11:59:52 nodeB corosync[28095]: [CPG ] chosen downlist from node r(0) ip(10.2.2.30)
Jun 15 11:59:52 nodeB corosync[28095]: [MAIN ] Completed service synchronization, ready to provide service.
Jun 15 11:59:52 nodeB kernel: dlm: closing connection to node 1

Here my cluster.conf
Quote:
<?xml version="1.0"?>
<cluster config_version="7" name="net-cluster">
<clusternodes>
<clusternode name="nodeA" nodeid="1"/>
<clusternode name="nodeB" nodeid="2"/>
</clusternodes>
<cman expected_votes="1" two_node="1"/>
<rm>
<failoverdomains>
<failoverdomain name="web-domain" nofailback="1" ordered="1" restricted="0">
<failoverdomainnode name="nodeA" priority="1"/>
<failoverdomainnode name="nodeB" priority="2"/>
</failoverdomain>
</failoverdomains>
<resources>
<ip address="10.2.2.50/24" monitor_link="on" sleeptime="4"/>
<script file="/etc/rc.d/init.d/httpd" name="httpd"/>
</resources>
<service domain="web-domain" exclusive="1" name="httpd" recovery="relocate">
<script ref="httpd"/>
<ip ref="10.2.2.50/24"/>
</service>
</rm>
</cluster>
Any idea?
Do we need to configure fencing devices for fail-over cases?
 
Old 06-15-2012, 09:23 AM   #4
kbscores
Member
 
Registered: Oct 2011
Location: USA
Distribution: Red Hat
Posts: 259
Blog Entries: 9

Rep: Reputation: 32
Maybe try something like this:

Code:
<fence_daemon post_fail_delay="0" post_join_delay="30" />
<clusternodes>
  <clusternode name="nodeA" nodeid="1">
    <fence />
  </clusternode>
  <clusternode name="nodeB" nodeid="2">
    <fence />
  </clusternode>
</clusternodes>
<fencedevices>
  <fencedevice agent="fence_manual" name="Manual_Fence" />
</fencedevices>
Everything else looks fine in your conf file.
 
Old 06-21-2012, 11:18 AM   #5
kirukan
Senior Member
 
Registered: Jun 2008
Location: Eelam
Distribution: Redhat, Solaris, Suse
Posts: 1,278

Original Poster
Rep: Reputation: 148Reputation: 148
Thanks kbscores, Seems the following configuration is working

Code:
<?xml version="1.0"?>
<cluster config_version="21" name="net-cluster">
        <fence_daemon post_join_delay="30"/>
        <clusternodes>
                <clusternode name="nodeA" nodeid="1">
                        <fence>
                                <method name="1">
                                        <device name="Manual_Fence" nodename="nodeA"/>
                                </method>
                        </fence>
                </clusternode>
                <clusternode name="nodeB" nodeid="2">
                        <fence>
                                <method name="1">
                                        <device name="Manual_Fence" nodename="nodeB"/>
                                </method>
                        </fence>
                </clusternode>
        </clusternodes>
        <cman expected_votes="1" two_node="1"/>
        <fencedevices>
                <fencedevice agent="fence_manual" name="Manual_Fence"/>
        </fencedevices>
        <rm>
                <failoverdomains>
                        <failoverdomain name="web-domain" nofailback="1" ordered="1" restricted="0">
                                <failoverdomainnode name="nodeA" priority="1"/>
                                <failoverdomainnode name="nodeB" priority="2"/>
                        </failoverdomain>
                </failoverdomains>
                <resources>
                        <ip address="10.2.2.50/24" monitor_link="on" sleeptime="4"/>
                        <script file="/etc/rc.d/init.d/httpd" name="httpd"/>
                </resources>
                <service domain="web-domain" name="httpd" recovery="relocate">
                        <script ref="httpd"/>
                        <ip ref="10.2.2.50/24"/>
                </service>
        </rm>
</cluster>

Last edited by kirukan; 06-21-2012 at 11:20 AM.
 
1 members found this post helpful.
Old 06-21-2012, 11:40 AM   #6
kirukan
Senior Member
 
Registered: Jun 2008
Location: Eelam
Distribution: Redhat, Solaris, Suse
Posts: 1,278

Original Poster
Rep: Reputation: 148Reputation: 148
I think it caused because of rgmanager running in both node(as per Redhat suggestion "HA service can run on only one cluster node at a time to maintain data integrity.") so in this case how we can fail-over the rgmanager to another node. can we add it as a service to relocate the service to another node when a node down? any idea?
Code:
Jun 22 00:15:47 nodeA corosync[1319]:   [TOTEM ] A processor joined or left the membership and a new membership was formed.
Jun 22 00:15:47 nodeA corosync[1319]:   [QUORUM] Members[2]: 1 2
Jun 22 00:15:47 nodeA corosync[1319]:   [QUORUM] Members[2]: 1 2
Jun 22 00:15:47 nodeA corosync[1319]:   [CPG   ] downlist received left_list: 0
Jun 22 00:15:47 nodeA corosync[1319]:   [CPG   ] downlist received left_list: 0
Jun 22 00:15:47 nodeA corosync[1319]:   [CPG   ] chosen downlist from node r(0) ip(10.2.2.20) 
Jun 22 00:15:47 nodeA corosync[1319]:   [MAIN  ] Completed service synchronization, ready to provide service.
Jun 22 00:15:54 nodeA rgmanager[2446]: I am node #1
Jun 22 00:15:55 nodeA rgmanager[2446]: Resource Group Manager Starting
Jun 22 00:15:55 nodeA rgmanager[2446]: Loading Service Data
Jun 22 00:15:56 nodeA rgmanager[2446]: Initializing Services
Jun 22 00:15:56 nodeA rgmanager[7000]: Executing /etc/rc.d/init.d/httpd stop
Jun 22 00:15:56 nodeA rgmanager[2446]: Services Initialized
Jun 22 00:15:56 nodeA rgmanager[2446]: State change: Local UP
Jun 22 00:15:56 nodeA rgmanager[2446]: Starting stopped service service:httpd
Jun 22 00:15:57 nodeA rgmanager[7135]: Adding IPv4 address 10.2.2.50/24 to eth0
Jun 22 00:15:59 nodeA avahi-daemon[2076]: Registering new address record for 10.2.2.50 on eth0.IPv4.
Jun 22 00:16:01 nodeA rgmanager[7220]: Executing /etc/rc.d/init.d/httpd start
Jun 22 00:16:02 nodeA rgmanager[2446]: Service service:httpd started
Jun 22 00:16:06 nodeA kernel: dlm: connecting to 2
Jun 22 00:16:06 nodeA kernel: dlm: connecting to 2
Jun 22 00:16:06 nodeA kernel: dlm: connecting to 2
Jun 22 00:16:06 nodeA kernel: dlm: connecting to 2
Jun 22 00:18:09 nodeA kernel: hrtimer: interrupt took 2423490 ns
Jun 22 00:18:35 nodeA kernel: INFO: task rgmanager:7271 blocked for more than 120 seconds.
Jun 22 00:18:35 nodeA kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jun 22 00:18:35 nodeA kernel: rgmanager     D 0000000000000000     0  7271   2443 0x00000080
Jun 22 00:18:35 nodeA kernel: ffff880058ecbc70 0000000000000086 ffff880058ecbc98 ffff880058ecbbf8
Jun 22 00:18:35 nodeA kernel: ffff880058ecbc98 0000000000000000 ffff880058ecbcb0 ffffffff81dff308
Jun 22 00:18:35 nodeA kernel: ffff880058da9078 ffff880058ecbfd8 000000000000f598 ffff880058da9078
Jun 22 00:18:35 nodeA kernel: Call Trace:
Jun 22 00:18:35 nodeA kernel: [<ffffffff810a09da>] ? futex_wait+0x21a/0x380
Jun 22 00:18:35 nodeA kernel: [<ffffffff814dd755>] rwsem_down_failed_common+0x95/0x1d0
Jun 22 00:18:35 nodeA kernel: [<ffffffff814dd8e6>] rwsem_down_read_failed+0x26/0x30
Jun 22 00:18:35 nodeA kernel: [<ffffffff8126e544>] call_rwsem_down_read_failed+0x14/0x30
Jun 22 00:18:35 nodeA kernel: [<ffffffff814dcde4>] ? down_read+0x24/0x30
Jun 22 00:18:35 nodeA kernel: [<ffffffffa04150b7>] dlm_user_request+0x47/0x240 [dlm]
Jun 22 00:18:35 nodeA kernel: [<ffffffff8115b0dc>] ? __kmalloc+0x20c/0x220
Jun 22 00:18:35 nodeA kernel: [<ffffffffa0422af6>] device_write+0x5f6/0x7d0 [dlm]
Jun 22 00:18:35 nodeA kernel: [<ffffffff812051a6>] ? security_file_permission+0x16/0x20
Jun 22 00:18:35 nodeA kernel: [<ffffffff81172718>] vfs_write+0xb8/0x1a0
Jun 22 00:18:35 nodeA kernel: [<ffffffff810d1b62>] ? audit_syscall_entry+0x272/0x2a0
Jun 22 00:18:35 nodeA kernel: [<ffffffff81173151>] sys_write+0x51/0x90
Jun 22 00:18:35 nodeA kernel: [<ffffffff8100b172>] system_call_fastpath+0x16/0x1b
Jun 22 00:20:35 nodeA kernel: INFO: task rgmanager:7271 blocked for more than 120 seconds.
Jun 22 00:20:35 nodeA kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jun 22 00:20:35 nodeA kernel: rgmanager     D 0000000000000000     0  7271   2443 0x00000080
Jun 22 00:20:35 nodeA kernel: ffff880058ecbc70 0000000000000086 ffff880058ecbc98 ffff880058ecbbf8
Jun 22 00:20:35 nodeA kernel: ffff880058ecbc98 0000000000000000 ffff880058ecbcb0 ffffffff81dff308
Jun 22 00:20:35 nodeA kernel: ffff880058da9078 ffff880058ecbfd8 000000000000f598 ffff880058da9078
Jun 22 00:20:35 nodeA kernel: Call Trace:
Jun 22 00:20:35 nodeA kernel: [<ffffffff810a09da>] ? futex_wait+0x21a/0x380
Jun 22 00:20:35 nodeA kernel: [<ffffffff814dd755>] rwsem_down_failed_common+0x95/0x1d0
Jun 22 00:20:35 nodeA kernel: [<ffffffff814dd8e6>] rwsem_down_read_failed+0x26/0x30
Jun 22 00:20:35 nodeA kernel: [<ffffffff8126e544>] call_rwsem_down_read_failed+0x14/0x30
Jun 22 00:20:35 nodeA kernel: [<ffffffff814dcde4>] ? down_read+0x24/0x30
Jun 22 00:20:35 nodeA kernel: [<ffffffffa04150b7>] dlm_user_request+0x47/0x240 [dlm]
Jun 22 00:20:35 nodeA kernel: [<ffffffff8115b0dc>] ? __kmalloc+0x20c/0x220
Jun 22 00:20:35 nodeA kernel: [<ffffffffa0422af6>] device_write+0x5f6/0x7d0 [dlm]
Jun 22 00:20:35 nodeA kernel: [<ffffffff812051a6>] ? security_file_permission+0x16/0x20
Jun 22 00:20:35 nodeA kernel: [<ffffffff81172718>] vfs_write+0xb8/0x1a0
Jun 22 00:20:35 nodeA kernel: [<ffffffff810d1b62>] ? audit_syscall_entry+0x272/0x2a0
Jun 22 00:20:35 nodeA kernel: [<ffffffff81173151>] sys_write+0x51/0x90
Jun 22 00:20:35 nodeA kernel: [<ffffffff8100b172>] system_call_fastpath+0x16/0x1b
Jun 22 00:22:35 nodeA kernel: INFO: task rgmanager:7271 blocked for more than 120 seconds.
Jun 22 00:22:35 nodeA kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jun 22 00:22:35 nodeA kernel: rgmanager     D 0000000000000000     0  7271   2443 0x00000080
Jun 22 00:22:35 nodeA kernel: ffff880058ecbc70 0000000000000086 ffff880058ecbc98 ffff880058ecbbf8
Jun 22 00:22:35 nodeA kernel: ffff880058ecbc98 0000000000000000 ffff880058ecbcb0 ffffffff81dff308
Jun 22 00:22:35 nodeA kernel: ffff880058da9078 ffff880058ecbfd8 000000000000f598 ffff880058da9078
Jun 22 00:22:35 nodeA kernel: Call Trace:
Jun 22 00:22:35 nodeA kernel: [<ffffffff810a09da>] ? futex_wait+0x21a/0x380
Jun 22 00:22:35 nodeA kernel: [<ffffffff814dd755>] rwsem_down_failed_common+0x95/0x1d0
Jun 22 00:22:35 nodeA kernel: [<ffffffff814dd8e6>] rwsem_down_read_failed+0x26/0x30
Jun 22 00:22:35 nodeA kernel: [<ffffffff8126e544>] call_rwsem_down_read_failed+0x14/0x30
Jun 22 00:22:35 nodeA kernel: [<ffffffff814dcde4>] ? down_read+0x24/0x30
Jun 22 00:22:35 nodeA kernel: [<ffffffffa04150b7>] dlm_user_request+0x47/0x240 [dlm]
Jun 22 00:22:35 nodeA kernel: [<ffffffff8115b0dc>] ? __kmalloc+0x20c/0x220
Jun 22 00:22:35 nodeA kernel: [<ffffffffa0422af6>] device_write+0x5f6/0x7d0 [dlm]
Jun 22 00:22:35 nodeA kernel: [<ffffffff812051a6>] ? security_file_permission+0x16/0x20
Jun 22 00:22:35 nodeA kernel: [<ffffffff81172718>] vfs_write+0xb8/0x1a0
Jun 22 00:22:35 nodeA kernel: [<ffffffff810d1b62>] ? audit_syscall_entry+0x272/0x2a0
Jun 22 00:22:35 nodeA kernel: [<ffffffff81173151>] sys_write+0x51/0x90
Jun 22 00:22:35 nodeA kernel: [<ffffffff8100b172>] system_call_fastpath+0x16/0x1b
Jun 22 00:24:23 nodeA kernel: dlm: connecting to 2
Jun 22 00:24:23 nodeA kernel: dlm: connecting to 2
Jun 22 00:24:23 nodeA kernel: dlm: connecting to 2
Jun 22 00:24:23 nodeA kernel: dlm: connecting to 2
..
..
..
 
Old 06-27-2012, 10:38 PM   #7
kbscores
Member
 
Registered: Oct 2011
Location: USA
Distribution: Red Hat
Posts: 259
Blog Entries: 9

Rep: Reputation: 32
rgmanager should be running on both nodes the service is defined in the conf file. A key component for an application deployment with rgmanager is a script to manage starting and stopping the application on a cluster node. For an Apache web server, these script sequences are found in the file /etc/rc.d/init.d/httpd. Then when you want to do a manual failover you use the command:

Code:
 

clusvcadm -r <serviceName> -m <nodeName>
If you dont know the nodeName do a clustat.
 
Old 09-01-2012, 10:58 AM   #8
kirukan
Senior Member
 
Registered: Jun 2008
Location: Eelam
Distribution: Redhat, Solaris, Suse
Posts: 1,278

Original Poster
Rep: Reputation: 148Reputation: 148
The above task has been completed based on the following post
http://www.linuxquestions.org/questi...er-4175425129/
 
1 members found this post helpful.
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
RedHat Cluster jnreddy Linux - Server 1 03-20-2011 11:17 PM
Redhat cluster sikander56k Linux - Newbie 3 04-21-2010 06:13 AM
Create N+1 cluster in redhat using redhat cluster software ranadeep Linux - Enterprise 2 04-03-2010 08:45 PM
Red Hat CLuster Suite without Fenching ? linuxunix Linux - Newbie 2 03-15-2010 08:11 AM
redhat cluster Ammad Linux - Server 0 12-13-2009 10:17 AM

LinuxQuestions.org > Forums > Enterprise Linux Forums > Linux - Enterprise

All times are GMT -5. The time now is 06:18 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration