Quote:
Originally Posted by Demosa
your monitor interval is longer than your timeout intervals, so it sounds like you're running into Fencing.
Try changing the start and stop timeouts to something longer, and the monitor interval shorter, see if that fixes it. (30s monitor, 2m start/stop)
|
It works in different way:
monitor interval - is the interval for the health check command.
start/stop timeout - is the time period while cluster will wait for the response from the script. And if script does not return any result within this period cluster will consider it like a failure.
So, those parameters do not affect each other.
With regard to my problem - I've found the reason.
I used "monitor" command for health check while the LSB-standard assumes "status" instead. I did this looking into corosync.log:
Quote:
Initiating monitor operation My_application_monitor_0 locally on Node2
Result of monitor operation for My_application on Node2: 7 (not running)
Initiating start operation My_application_start_0 locally on Node2
Result of start operation for My_application on Node2: 0 (ok)
Initiating monitor operation My_application_monitor_30000 locally on Node2
Result of monitor operation for My_application on Node2: 7 (not running)
Initiating stop operation My_application_stop_0 locally on Node2 | action 2
Result of stop operation for My_application on Node2: 0 (ok)
|
Pacemaker mainly works with OCF-standard where "monitor" command is used, but for LSB it sends "status" and writes "monitor" to the log file.
I just changed the "monitor()" function name to "status()" and it become stable. Actually, it was my fault since LSB-standard does not have "monitor" command in the specification.