LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Software (https://www.linuxquestions.org/questions/linux-software-2/)
-   -   Corosync/pacemaker fencing issue (https://www.linuxquestions.org/questions/linux-software-2/corosync-pacemaker-fencing-issue-4175680241/)

mozer 08-11-2020 05:35 AM

Corosync/pacemaker fencing issue
 
Hello all,

I've created a corosync/pacemaker cluster on Centos 8 with 3 Vmware nodes, everything runs as expected, i configured a floating ip among them, this responds well

Quote:

pcs status,

Full List of Resources:
* Cluster_VIP (ocf::heartbeat:IPaddr2): Started esba-cl-test-01
Now i want to set the fencing for the cluster following this (among others)

With the following

Quote:

pcs stonith create vmfence fence_vmware_rest pcmk_host_map="esba-cl-test-01:vm-1;esba-cl-test-02:vm-2;esba-cl-test-03:vm-3" ipaddr=192.168.25.45 ssl=1 login=XXXX passwd=XXXX ssl_insecure=1
Everything goes well, no errors

Quote:

Full List of Resources:
* Cluster_VIP (ocf::heartbeat:IPaddr2): Started esba-cl-test-01
* vmfence (stonith:fence_vmware_rest): Started esba-cl-test-02
Problem comes when I try to test the fencing

Quote:

stonith_admin --reboot esba-cl-test-03
I receive

Quote:

pacemaker-fenced[9403]: error: Operation 'reboot' targeting esba-cl-test-03 on <no-one> for stonith_admin.745607@esba-cl-test
-01.8e1c6371: No route to host
This problem has been reported in redhat forums but unfortunately I don’t have a user so I can't check the solution

I don’t have DNS configured, servers resolv with hosts file, but this should be enough

Has anyone encounter this problem?
Can anyone please help?


Thanks

tshikose 08-23-2020 05:12 PM

Hi,

Taken from the link you provided above.
I split it to not make it a long one post.


Environment

Red Hat Enterprise Linux (RHEL) 7 Update 5
Red Hat Enterprise Linux (RHEL) 8
Pacemaker High Availability or Resilient Storage Add On
VMware vSphere version 6.5 and above.

tshikose 08-23-2020 05:14 PM

Resolution

Assuming following is cluster architecture:
cluster node hostnames are node1 and node2
cluster node names as seen by the vmware hypervisor (ESXi/vCenter) are node1-vm and node2-vm
<ESXi/vCenter IP address> is IP address of vmware hypervisor which is managing cluster nodes VMs

First check if cluster node is able to reach the hypervisor and list VMs on it. Following command will try to connect to hypervisor with provided credentials and list all machines.

Code:

    # fence_vmware_rest -a <ESXi/vCenter IP address> -l <esxi_username> -p <esxi_password> --ssl-insecure -z -o list | egrep "(node1-vm|node2-vm)"
    node1-vm,
    node2-vm,
    # fence_vmware_rest -a <ESXi/vCenter IP address> -l <esxi_username> -p <esxi_password> --ssl-insecure -z -o status -n node1-vm
    Status: ON

If above list fails, then make sure the below is true
Node is able to communicate with ESXi/vCenter on port 443/tcp (when using SSL) or on port 80/tcp (without SSL).
Ensure that the user has permissions on ESXi/vCenter for fencing.
Check if the ESXi/vCenter has trustworthy SSL certificate. If the certificate cannot be trustworthy check solution on how to relax some SSL checks.

tshikose 08-23-2020 05:15 PM

f command succeeded the node is able to communicate with hypervisor. Stonith device should be configured using same configuration options as were tested in listing. Some of arguments for the fence_vmware_rest command and fence_vmware_rest fencing agent in pacemaker can have slightly different name.
For this reason check the help pages of both - fence_vmware_rest command and fence_vmware_rest fencing agent (In diagnostics section is shortened listing of options used by this solution)

Create the stonith device using command below. The pcmk_host_map attribute is used to map node hostname as see by cluster to the name of virtual machine as seen on vmware hypervisor.

The first attribute in pcmk_host_map is the cluster node name as seen in /etc/corosync/corosync.conf file and the next attribute, that is post semicolon is the cluster node names as seen by the vmware hypervisor.

Code:

    # cat /etc/corosync/corosync.conf
    [...]
    nodelist {
        node {
            ring0_addr: node1  <<<=== Cluster node name
            nodeid: 1
        }

        node {
            ring0_addr: node2
            nodeid: 2
        }
    }

    # pcs stonith create vmfence fence_vmware_rest pcmk_host_map="node1:node1-vm;node2:node2-vm" ipaddr=<ESXi/vCenter IP address> ssl=1 login=<esxi_username> passwd=<esxi_password> ssl_insecure=1

To check the status of stonith device and its configuration use the commands below.

Code:

    # pcs stonith show
    Full list of resources:
    vmfence (stonith:fence_vmware_rest):    Started node1

    # pcs stonith show vmfence --full
    Resource: vmfence (class=stonith type=fence_vmware_rest)
      Attributes: pcmk_host_map=node1:node1-vm;node2:node2-vm ipaddr=<ESXi/vCenter IP address> ssl=1 login=<esxi_username> passwd=<esxi_password> ssl_insecure=1

When stonith device is started proceed with proper testing of fencing in the cluster.

Additional notes and recommendations:

Make sure package fence-agents-4.0.11-86.el7 or later is installed which has new agent fence_vmware_rest.
fence_vmware_rest works with VMware vSphere version 6.5 or higher.
Please refer to following link for support policies of fence_vmware_rest.
Once configured, it is highly recommended to test the fence functionality.
The fence agent fence_vmware_soap causes CPU usage to spike.
There is a known limitation imposed by the VMware Rest API of 1000 VMs: fence_vmware_rest monitor fails with error: "Exception: 400: Too many virtual machines. Add more filter criteria to reduce the number."

tshikose 08-23-2020 05:16 PM

A final note, is that not being a VM user, I cannot help more than copying and pasting as I did.
I hope it will help.


All times are GMT -5. The time now is 08:14 PM.