Pacemaker Cluster iSCSI Target does not gracefully Stop

dcapone2004 · 06-13-2018, 09:36 PM

I have configured an iSCSI HA cluster using DRBD, Pacemaker, and Corosync. I am using the tgt implementation and am using CentOS 7. I attempted to use the LIO-T implementation, however, the write performance using LIO was 3 times lower than TGT for some reason. After some googling on that issue, I learned that LIO has apparently not yet been fully performance tuned and there are some issues if certain settings do not match perfectly between initiator and target.

Anyway, onto my issue...

If I attempt to "gracefully" move the iSCSI resources and/or put one of the nodes into Standby mode to gracefully move the nodes that way, it ends up being a nightmare with a lot of pcs resource cleanups needing to be issues to eventually get the Iscsitarget resources to stop.

After running pcs resource debug-stop (and also combing the corosync log file), the issue appears that tgtadm which I understand pcs calls on the backend to shutdown the target, does not release the target because of existing iSCSI initiator connections that are established to the target. Exact output is:

> stderr: tgtadm: this target is still active
> stderr: WARNING: Failed to remove target <target iqn>, retrying.
> stderr: tgtadm: this target is still active
> stderr: WARNING: Failed to remove target <target iqn>, retrying.
> stderr: tgtadm: this target is still active
> stderr: WARNING: Failed to remove target <target iqn>, retrying.
> stderr: tgtadm: this target is still active
> stderr: WARNING: Failed to remove target <target iqn>, retrying.
> stderr: tgtadm: this target is still active

The stop fails, the resource goes into a failed state and then I am forced to run cleanup to eventually force the resource to stop to move over. This obviously is far from ideal and takes away significantly the HA of the solution as the process takes long enough to cause my initiator connections to drop and need to be manually reconnected.

Is there something in my configuration that I am missing to force tgtadm to immediately close any connections? How should this situation be handled?

If it matters, the initiators connecting are all Microsoft Windows Server 2016 iSCSI initiators.

Any assistance in resolving this would be greatly appreciated.