GFS and Fencing

akvino · 09-04-2008, 01:21 PM

I am working on setting up test environment for GFS. I have a few questions regarding fencing setup, would appreciate if someone could point me in the right direction.

We would like to utilize HBA/ or SAN switch fencing, however it appears that the SAN switch setting needs admin login and password. How does this exactly work? We have SAN switch supporting 128 connections and in our test environment only 3 of those would be connected to my test servers?

What does SCSI fencing do?
Is there an alternative to when the node is fenced it stops communicating to GFS but is still up (meaning operational, not powered down)?

elcody02 · 09-05-2008, 01:22 AM

On SAN Fencing:
Basically the SAN Fencing implemented by RedHat within the fenceagents fence_brocade, fence_sanbox2, fence_mcdata, fence_vixel is more or less done as follows.

When a node is elected to be fenced the fencing node logs in to the management interface of the sanswitch and disables the port(s) the fenced node is connected to. The idea behind fencing is to stop the node(s) not functioning properly from currupting data - nothing else. And this is what these agents *ONLY* do. The powering off and on of machines by other fencing methods is only a (nice) side effect of fencing.

BTW. SAN Fencing has some disadvantages. First there are some agents (fence_brocade) that login to the switches via telnet. That means only one fencing can be done and if somebody else is logged in via telnet fencing will fail. Second and that holds for all fence agents except the fence_scsi is that this fencing way is not working when your cluster is spread over two data center and one data center goes down. That's a common risc you have to be aware of and accept.

On fence_scsi and quorum disk:

When using two synchronous mirrored storage devices, a quorum disk and fence_scsi this problem can be eliminated.

The concept behind fence_scsi (put in few words) is that any node in the cluster registers itself with the storage device when booting up (this is done via scsi_reservations) to being able to issue I/Os to the storagedevice. The node then updates its scsi_reservation during its livetime.

When a node is elected to be fenced its scsi-reservation is revoked by the fencing node and therefore the fenced node loses its scsi_reservation and therefore loses the right to access the storage device.

For more information see man fence_scsi it covers this concept in more detail.

akvino · 09-08-2008, 04:48 PM

Quote:

Originally Posted by elcody02

On SAN Fencing:
Basically the SAN Fencing implemented by RedHat within the fenceagents fence_brocade, fence_sanbox2, fence_mcdata, fence_vixel is more or less done as follows.

When a node is elected to be fenced the fencing node logs in to the management interface of the sanswitch and disables the port(s) the fenced node is connected to. The idea behind fencing is to stop the node(s) not functioning properly from currupting data - nothing else. And this is what these agents *ONLY* do. The powering off and on of machines by other fencing methods is only a (nice) side effect of fencing.

BTW. SAN Fencing has some disadvantages. First there are some agents (fence_brocade) that login to the switches via telnet. That means only one fencing can be done and if somebody else is logged in via telnet fencing will fail. Second and that holds for all fence agents except the fence_scsi is that this fencing way is not working when your cluster is spread over two data center and one data center goes down. That's a common risc you have to be aware of and accept.

On fence_scsi and quorum disk:

When using two synchronous mirrored storage devices, a quorum disk and fence_scsi this problem can be eliminated.

The concept behind fence_scsi (put in few words) is that any node in the cluster registers itself with the storage device when booting up (this is done via scsi_reservations) to being able to issue I/Os to the storagedevice. The node then updates its scsi_reservation during its livetime.

When a node is elected to be fenced its scsi-reservation is revoked by the fencing node and therefore the fenced node loses its scsi_reservation and therefore loses the right to access the storage device.

For more information see man fence_scsi it covers this concept in more detail.

Thanks for the answer. We will try to setup ILO fencing or SCSI fencing. One thing I have noticed, on the reboot - when fencing is not configured it takes a long time for fenced to time out when system is starting clustering, as well conga - is displaying conflicting information regarding node status and cluster membership.
Do you find it easier to modify cluster.conf script vs. doing it with conga from the lucy server>?|