eantoranz 01-02-2013 03:29 PM

pacemaker - iscsi: failing to do a mount but works when cleaned up

I'm having these experiments with iscsi and pacemaker. I have a san that holds 3 partitions. I have a ocf:heartbeat:iscsi resource for it and it works like a charm.... well, mostly.

I have noticed that if I try to migrate the group that holds everything together, it will move the resources successfully up to the san and then the following resource, which is a mount point for one of its resources) will fail. On the log I can see this:


Jan  2 16:41:22 cluster2 Filesystem[4755]: WARNING: Couldn't find device [/dev/disk/by-uuid/aac80604-cba5-4fb8-a7f5-a29187c51f52]. Expected /dev/??? to exist
Jan  2 16:41:22 cluster2 cib: [4754]: info: retrieveCib: Reading cluster configuration from: /var/lib/heartbeat/crm/cib.Q6XEcr (digest: /var/lib/heartbeat/crm/cib.QeHMGM)
Jan  2 16:41:22 cluster2 lrmd: [2735]: WARN: Managed datapostgres:monitor process 4755 exited with return code 7.

Now, if I try to do a cleanup from either of the nodes, it won't work.... the interesting part is _why_ it won't work. Instead of just trying to start the resource from that point (having the san active on the appointed node) it will bring everything down, then start everything up to the san, then the mount point and that fails. Then it tried to migrate to the other node and it fails with the same sequence and then it comes back to the node where it was before cleaning up activating up to the san (won't try to mount again).

The funny part is that if I stop the inactive node (by stopping corosync) and I do the clean up, it will work (unlike when both nodes were up, it won't bring the san resource down). Given that the failed attempts happen because the san has been activated right before the mount point I think that by the time the mount is going to be attempted, the resources from the san haven't been probed yet and so the devices that are on it are missing. Is there a way to make the san resource wait for the OS to probe it or something?

Thanks in advance.

eantoranz 01-02-2013 03:50 PM

Got it. Added a ocf:heartbeat:Delay resource after starting the san.

