LinuxQuestions.org
Review your favorite Linux distribution.
Go Back   LinuxQuestions.org > Forums > Enterprise Linux Forums > Linux - Enterprise
User Name
Password
Linux - Enterprise This forum is for all items relating to using Linux in the Enterprise.

Notices


Reply
  Search this Thread
Old 05-24-2011, 01:03 PM   #1
arunabh_biswas
Member
 
Registered: Jun 2006
Posts: 92

Rep: Reputation: 16
Exclamation Redhat Clustering & Shared storage


Greeting for the day !!

I'm implementing a 2 node cluster in rhel 5.5. My setup is . . . .

Total Member nodes - 2
Host OS - RHEL 5.5 64 bit (using RHCS for clustering)
LAN cards - 4 cards each server
HBA - 2 cards (single ported for san access connected to brocade san switch)
IMPI - ILO2 (For fencing, but not sure whether use it or not !!)

Now, my plan is to implement a 2 node cluster with failover facility. For network failover i've created 2 pair of bonds on each server. i.e. bond0 (eth0 & eth2) and bond1 (eth1 & eth3). Futher I'll connect "bond0" to my network switch for client access (2 cables for each card) and "bond1" to other node in the cluster (back-to-back connectivity) for sharing cluster information (can say "Heartbeat").

We'll use ILO as (IPMI or power fencing) fencing device. Connect both ILO ports to network switch.

Also we've connected the nodes to 2 SAN swithes using FC cable from each HBA card. (i.e. HBA1 to SAN SW1 & HBA2 to SAN SW2), so that they can discover and mount the mapped LUNs.

Before implementing the same, I need your help to get clarification on below doubts :

1) As we've planned to use ILO as a fencing device, it tends to reboot or shutdown the errorneous node from the cluster. What will be reaction, when it reboot back and get online ?
a) If we fell into such situation, whether ILO can be configured to shutdown the problematic node to avoid restarting and it won't try to fence the other working node?

2) In case, if the back-to-back connectivity losses (heartbeat connection on bond1), i read somewhere that it leads to SPLIT-BRAIN situation where each node try to fence each other.
a) In such case, which node fence first to other node ?
b) Whether both can write to the common assigned storage LUNs? If yes, what could be the situation? Is is going to corrupt the data? what is the remedy for that?
c) Is this the reason, why QUORUM disk is used to avoid such situation? If Yes, what'll be the criticality if we don't use Qdisk with ILO fencing?

3) Can Qdisk & ILO fencing combination is the best solution for a 2 node cluster ? What will be the situation, if we don't use Qdisk with ILO fencing?

4) I also read that, by default, eth0 or bond0 is used for heartbeat connection ? If such is the case, it'll be transmitted through my public network which is not a good idea ? Can we change the default network port for heartbeat packets ?

5) Final one is, the LUNs which were mapped to this member nodes are discovered but their device names are not persistent. It changes on every reboot. I've search and found lot of document after googling it but none was specific and properly addressing to my issue. What i understand from those docs that I've to do Persistent Binding of LUNs devices. Kindly, guide me to do the same step wise.

Kindly, help me to get some fruitful conclusion. If possible, pls use some examples or diagrams. If you need some more inputs, kindly let me know.

Thanks

Arunabh

Last edited by arunabh_biswas; 05-24-2011 at 01:08 PM.
 
Old 05-24-2011, 02:01 PM   #2
mpapet
Member
 
Registered: Nov 2003
Location: Los Angeles
Distribution: debian
Posts: 548

Rep: Reputation: 72
The quorum is assigned to the active node of the cluster.
If you are running a clustered application that needs a LUN (not the quorum), the active node is the ONLY one with access to the LUN.

We don't use ILO in a cluster just to avoid the complexity. If an active node fails, the cluster rolls to the other machine. Reboot the failed node and it rejoins the cluster as the passive node like magic!

Some of those doomsday scenarios you describe are extremely low probability. We have a "hot spare" cluster if we couldn't just shut down one node and clear the quorum before restart in a split-brain scenario.

I need some clarification on the networking setup. I just made up the IP addresses.

Node1: bond0-192.168.1.4 to switch, bond1-10.0.0.1 to Node2
Node2: bond0-192.168.1.5 to switch, bond1-10.0.0.2 to Node1

Is that correct?

Last edited by mpapet; 05-24-2011 at 02:05 PM.
 
Old 05-25-2011, 01:27 AM   #3
arunabh_biswas
Member
 
Registered: Jun 2006
Posts: 92

Original Poster
Rep: Reputation: 16
Thanks Mpapet for your reply. I've few more queries inline to your reply.

Quote:
Originally Posted by mpapet View Post
The quorum is assigned to the active node of the cluster.
How should i configure the Qdisk so that i can be primarily mapped on Primary node and in case it goes down it can be accessible from the secondary node ?


Quote:
If you are running a clustered application that needs a LUN (not the quorum), the active node is the ONLY one with access to the LUN.
We have assigned 4 LUNs to both nodes (200 GB, 100 GB, 25 GB, 25 GB). The problem what i'm facing is the LUN devices which were assigned by OS (i.e. dm-10 or dm-11 etc) to each LUNs are keep changing after reboot, though we've installed multipath software (HPDM).

Quote:
We don't use ILO in a cluster just to avoid the complexity. If an active node fails, the cluster rolls to the other machine. Reboot the failed node and it rejoins the cluster as the passive node like magic!
U mean to say that we can implement cluster without ILO fencing, but are the implecations behind that. Without ILO fencing, if the back-to-back connectivity fails (i know its rare, but i have to be prepare for that as we don't have "HOT SPARE" cluster as you have), what could be the remedy.



Quote:
I need some clarification on the networking setup. I just made up the IP addresses.
Node1: bond0-192.168.1.4 to switch, bond1-10.0.0.1 to Node2
Node2: bond0-192.168.1.5 to switch, bond1-10.0.0.2 to Node1

Is that correct?
Yes, we're following the same kind of ip schema. Bond0 is for public network (LAN) and Bond1 for private (heartbeat).

Also, pls suggest for Persistent binding also. As we're planning use those LUNs in LVM after implementing the persistent binding.
 
Old 05-25-2011, 02:01 PM   #4
mpapet
Member
 
Registered: Nov 2003
Location: Los Angeles
Distribution: debian
Posts: 548

Rep: Reputation: 72
Quote:
Originally Posted by arunabh_biswas View Post

How should i configure the Qdisk so that i can be primarily mapped on Primary node and in case it goes down it can be accessible from the secondary node ?
The cluster will do this for you. Just let the cluster manage the quorum.

Quote:
Originally Posted by arunabh_biswas View Post
We have assigned 4 LUNs to both nodes (200 GB, 100 GB, 25 GB, 25 GB).
Whatever LUNs are needed by the cluster will be controlled by the active node in the cluster. As long as both nodes are registered on the SAN you should be good to go. Don't try to do anything fancy with the LUNs outside of the cluster. It will end in tears.

Quote:
Originally Posted by arunabh_biswas View Post
The problem what i'm facing is the LUN devices which were assigned by OS (i.e. dm-10 or dm-11 etc) to each LUNs are keep changing after reboot, though we've installed multipath software (HPDM).
If you aren't already registered over at HP's forums, do so. You should get an answer there. As for using the LUNs in an LVM and then have the LVM presented to the cluster is creating sufficient conditions for a spectacular failure. Manage the disk size at the SAN. It's the right tool for the job.

Quote:
Originally Posted by arunabh_biswas View Post
U mean to say that we can implement cluster without ILO fencing, but are the implecations behind that. Without ILO fencing, if the back-to-back connectivity fails (i know its rare, but i have to be prepare for that as we don't have "HOT SPARE" cluster as you have), what could be the remedy.
Talk about a corner case! Your odds of winning a lottery are better especially with redundant NICs.

If you ever get into a split-brain scenario, there's no fiddling around. Shut down both nodes, clear the quorum disk in single user mode on one node and then continue in multi-user mode. Then the do the same on the other node. If the other node does not rejoin, rebuild the second node. I've had to do this before. But never because of a split brain.

Don't take my word for it. Build one and find out.

Last edited by mpapet; 05-25-2011 at 02:07 PM.
 
Old 05-25-2011, 04:12 PM   #5
elcody02
Member
 
Registered: Jun 2007
Posts: 52

Rep: Reputation: 17
Quote:
Originally Posted by arunabh_biswas View Post
Greeting for the day !!
..
1) As we've planned to use ILO as a fencing device, it tends to reboot or shutdown the errorneous node from the cluster. What will be reaction, when it reboot back and get online ?
That depends. If the problem was related to a software bug you are lucky and the node joins back to the cluster and everything is ok.
If the problem persists you run in to the so called "PING-PONG" effect. As you might have guessed.
Quote:
Originally Posted by arunabh_biswas View Post
a) If we fell into such situation, whether ILO can be configured to shutdown the problematic node to avoid restarting and it won't try to fence the other working node?
Yes it can. Use <fenceagent agent=".." name=".." option="off"/>. This should normally work.
If you are in doubt have a look in the man page of the fenceagent. This should help you about what options to use.
Quote:
Originally Posted by arunabh_biswas View Post
2) In case, if the back-to-back connectivity losses (heartbeat connection on bond1), i read somewhere that it leads to SPLIT-BRAIN situation where each node try to fence each other.
a) In such case, which node fence first to other node ?
b) Whether both can write to the common assigned storage LUNs? If yes, what could be the situation? Is is going to corrupt the data? what is the remedy for that?
c) Is this the reason, why QUORUM disk is used to avoid such situation? If Yes, what'll be the criticality if we don't use Qdisk with ILO fencing?
The split brain situation in two node clusters without qdisk leads to a fencing race. Which means the first one wins. ILO Cards tend to both nodes win as they are very slow and both nodes often happen to issue the fencing simultaneously and both are reset afterwords ;-) .
With qdisk the node gets quorum (and therefore the power to fence) that has access to storage and succeeds in as many heuristics that you specify (ping a tiebreaker ip, ..).
But still also with qdisk (without heuristics) the split brain situation might also lead to a fencing race. Both nodes might not see each other on the heartbeat network but can still access the storage (unlikely one, but still not impossible).
Quote:
Originally Posted by arunabh_biswas View Post
3) Can Qdisk & ILO fencing combination is the best solution for a 2 node cluster ? What will be the situation, if we don't use Qdisk with ILO fencing?
As I said before.
Don't forget that if you want to use the qdisk you need to be very picky about timeouts.
Quote:
Originally Posted by arunabh_biswas View Post
4) I also read that, by default, eth0 or bond0 is used for heartbeat connection ? If such is the case, it'll be transmitted through my public network which is not a good idea ? Can we change the default network port for heartbeat packets ?
Yes you can. The nodename specified in the cluster.conf needs to resolve to an ip. The NIC that holds the ip is chosen for heartbeat.
Quote:
Originally Posted by arunabh_biswas View Post
5) Final one is, the LUNs which were mapped to this member nodes are discovered but their device names are not persistent. It changes on every reboot. I've search and found lot of document after googling it but none was specific and properly addressing to my issue. What i understand from those docs that I've to do Persistent Binding of LUNs devices. Kindly, guide me to do the same step wise.
Use devicemapper-multipath. There you'll have persistent bindings (which means persistent block device names /dev/mapper/mpathXX. Those are mapped to the /dev/dm-YY devices). Don't forget to synchronize the file /var/lib/multipath/bindings between both nodes as this file stores the mappings.

Hope this helps somehow.
 
Old 05-26-2011, 03:14 AM   #6
arunabh_biswas
Member
 
Registered: Jun 2006
Posts: 92

Original Poster
Rep: Reputation: 16
Dear Mpapet & Elcody02,

Thanks to both of you for the clarifications and suggestions you made. It clears my confusion a lot. What I get from all this, i just want to summerize it. Finally, I requested you both to make it make it confirm. As per my requirement I go in this way . . .

1) As first, I'll configure HPDM (HP multipath driver) to make the storage LUNs persistent so that it won't change / differ across the cluster.

2) If i got it correctly, its not mandatory to use quorum disk in my scenario for 2 nodes cluster. All the resources / services will be kept with the primary node. In case it fails, cluster services will take care of it and move it to the another node in the cluster. But in such case, if SPLIT-BRAIN issue arises, then how the nodes will act (or find out which one is the failing node) and whom to fence properly ?

3) For changing up the default network path for heatbeat, first I'll map the ip of heartbeat network with the hostname mentioned in the cluster.conf in /etc/hosts file.

4) In 2 node cluster, how many failover domains I should create so the it failover perfectly?

5) Whether GFS2 has any feature that eliminates the requirement of Qdisk?

I know i'm asking to much but if possible, do you guys have any documentation (with screenshots) of configuration of 2 node Redhat cluster with FC SAN access, (may be you prepared sometime for any Production or practice site) so that it makes my life a little bit easier.


One again, thanks alot for your replies.

Regards
Arunabh

Last edited by arunabh_biswas; 05-26-2011 at 03:18 AM.
 
Old 05-26-2011, 01:44 PM   #7
elcody02
Member
 
Registered: Jun 2007
Posts: 52

Rep: Reputation: 17
Quote:
Originally Posted by arunabh_biswas View Post
Dear Mpapet & Elcody02,
1) As first, I'll configure HPDM (HP multipath driver) to make the storage LUNs persistent so that it won't change / differ across the cluster.
This should depend on the type of storage you are using behind the brocade switches.
Keep in mind that the hp multipath driver is nothing else as the device-mapper-multipath.
If you are using HP storage you can easily go with it otherwise stick with the Red Hat packages.
Quote:
Originally Posted by arunabh_biswas View Post
2) If i got it correctly, its not mandatory to use quorum disk in my scenario for 2 nodes cluster. All the resources / services will be kept with the primary node. In case it fails, cluster services will take care of it and move it to the another node in the cluster. But in such case, if SPLIT-BRAIN issue arises, then how the nodes will act (or find out which one is the failing node) and whom to fence properly ?
You don't need the quorum disk, right. But it makes split brain situations more unlikely (with the price of more complexity and very big timeouts if you follow Red Hat best practices).
Quote:
Originally Posted by arunabh_biswas View Post
3) For changing up the default network path for heatbeat, first I'll map the ip of heartbeat network with the hostname mentioned in the cluster.conf in /etc/hosts file.
Exactly.
Quote:
Originally Posted by arunabh_biswas View Post
4) In 2 node cluster, how many failover domains I should create so the it failover perfectly?
Again I would say: Follow the rule of KISS. There is the implicit failover domain of all nodes in the cluster (two) with equal priority. If you want to have a specific node being primary node. You need one failover domain (primarynode having priority 1, secondarynode having priority 2, if I recall it right).
But only do such things if you have a reason.
Quote:
Originally Posted by arunabh_biswas View Post
5) Whether GFS2 has any feature that eliminates the requirement of Qdisk?
No GFS2/GFS or even cluster filesystems is a completely different part of the story.
The quorum disk tries to give you a third vote and therefore "emulates" a third node. Additionally you can add heuristics and the qdisks checks for the shared storage.
A cluster filesystems gives you a filesystem that can be read and written by multiple nodes concurrently with a consistent view on filesystem bases. That's basically it.
 
Old 05-27-2011, 05:20 AM   #8
arunabh_biswas
Member
 
Registered: Jun 2006
Posts: 92

Original Poster
Rep: Reputation: 16
Thanks alot to both of you...

Now, I understand and it make my vision clear to implement rh cluster as per the requirement. Again thanks to all of you.

But i'll keep this thread open for few days to let other ppls get back with their suggestions.

Thanks again. :-)
 
Old 06-10-2011, 02:32 PM   #9
arunabh_biswas
Member
 
Registered: Jun 2006
Posts: 92

Original Poster
Rep: Reputation: 16
Hi again,

I've installed HPDM v4.4.1 multipathing s/w provided with HP server (psp). It overrides the default /etc/multipath.conf file and configured it automatically. By default I got all the multipathed device in /dev/mapper directory (i.e. mpath0,mpath1...so on).

As I've mentioned in my previous posts, I'm implementing a 2 node Active/Passive cluster in RHEL 5.5, multiple (same) luns are mapped to both cluster nodes. I'm creating LVM partitions using /dev/mapper/mpathx devices and its i m able to create lvm partitions without any hurdle. I can see these lvm partitions refleted in the another node using pvdisplay,lvdisplay,vgdisplay (though, i also used pvscan,lvscan, vgscan so that the 2nd node should scan the lvm update). But lvm volume device files (/dev/mapper/mpathx) are not created / appeared in /dev/mapper directory as these are present in the same dir in 1st node. I've checked by restarting multipathd service and also i've checked by rebooing the server.

One more thing i wants to add, that is, I can delete or modify those lvm paritions from 2nd node.

At this point, my question is that, why its happening ? Is this produce a significant impact on cluster, in case the primary node goes down (where i've created lvm partitions and the device files are physically present)?

How to have the same device files (lun devices created by HPDM multipathing s/w on primary node) make physically available on both nodes ?


kindly suggest.

Thanks.

Arunabh
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Best option for storage clustering samarudge Linux - Server 2 04-26-2011 07:03 AM
RHEL clustering and shared storage sajithpv Linux - Server 1 11-14-2010 02:36 AM
Redhat Clustering storage red_coder Linux - Server 1 05-01-2008 07:04 AM
Clustering without external storage ajatiti Linux - Enterprise 4 10-31-2007 09:38 PM
Scalable Storage Clustering with redundancy? What to use? humbletech99 Linux - Server 2 07-03-2007 04:04 AM

LinuxQuestions.org > Forums > Enterprise Linux Forums > Linux - Enterprise

All times are GMT -5. The time now is 01:20 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration