LinuxQuestions.org - Sles 11 cluster (heartbeat) shared file sys config

- Linux - Server (https://www.linuxquestions.org/questions/linux-server-73/)

- - Sles 11 cluster (heartbeat) shared file sys config (https://www.linuxquestions.org/questions/linux-server-73/sles-11-cluster-heartbeat-shared-file-sys-config-909469/)

Sles 11 cluster (heartbeat) shared file sys config

Hello,

I have 2 same H/W config HP DL 585 servers with SAN storage.
We were planning to go live with cluster but go live deadlines made us to config only one server with SAN.
Now we run SAP in one server with SLES 11 SP1. All the SAP related partitions are on SAN and OS partitions (and oraarch) are in server file system. New server has also SLES 11 SP1 installed. Both servers can see the SAN.
Now SAP Basis consultant has asked me to configure the OS level cluster which enables him to use the OS cluster service and configure the SAP cluster). So I need to start configuring cluster and need some info to make sure everything goes smoothly.
This is what I plan to do.
I am going to configure Heart Beat for cluster config. with following steps.

1. Edit /etc/hosts for server communication
2. Connect both servers using cross cable
3. Install heartbeat in the current production system (heartbeat-1:~ # yast2 heartbeat)
4. Install heartbeat in the new server (heartbeat-1:~ # yast -i heartbeat)
5. Defining the communication channels (bind net, redundancy)
6. Define authentication settings in production server.
7. Transfer the config to new server (with Csync for synchronizations)
8. Start the initial synchronization (csync2 -xv)
9. Configure OpenAIS to start cluster services automatically during the boot up in both servers.
10. Start the cluster on both servers (/etc/init.d/heartbeat start)
11. set a pwd for hacluster (heartbeat-1:~ # passwd hacluster)
12. configure the cluster (heartbeat-1:~ # hb_gui)
13. Configure global cluster (with stonith enabled)
14. Configure cluster resources (hb_gui/peace maker program)and add the IP addresses and file systems

How ever I need to solve following concerns first. Please help me with it.

1. Do I need to configure trust relationship within two servers root user?
2. I am confused over the file system clustering. Do I need to add file system as a resource in clustering or it will be a SAP basis part? If so what are the steps I have to go through?
3. Please give me a explanation on how clones come in to picture?
4. Do I need to change the file system to OCF RAs? Can I do it without formating my current file system? (LVM/ext3)
5. Otehr than server communication I need to configure logical address for both servers for user access? If yes, how to do it and when?
6. When the quoram comes in to picture? When I need to configure it?
* Is there any one who can help me online when the configuration starts?

I know its lot of reading, but many thanks if you can help me.

Regards,

CCIlleperuma.

First of all I'm not an expert in clustering, I just set up my cluster (active/active) and gone through my troubles that time so the only help I can give you is my experience.

When you refer to heartbeat I guess you are talking about heartbeat/pacemaker, heartbeat is a lower layer and pacemaker is the one in charge of managing the cluster.

You can save yourself a lot of pain reading the documentation here.

I'll try to answer your questions:

Quote:

1. Do I need to configure trust relationship within two servers root user?

To save you some time you can add (autojoin any) to your ha.cf file for any new node to join in, and create a sha1 key in your authkeys to prevent those unwanted nodes to join in.

Quote:

2. I am confused over the file system clustering. Do I need to add file system as a resource in clustering or it will be a SAP basis part? If so what are the steps I have to go through?

If you are using heartbeat/pacemaker, the resource files will be located at /usr/lib/ocf/resource.d. There you will find heartbeat and pacemaker directories. You can create one for your own scripts and configure pacemaker to access there. In /var/lib/heartbeat/crm you will find the crm configuration and epoch files. You MUST NEVER touch this files. Either you change your cluster/nodes configuration from crm itself or you create a file and load it with cibadmin.

Quote:

3. Please give me a explanation on how clones come in to picture?

You can have two types of configuration (active/active) where you have services running in two machines and if a service fails it is started on the other machine (or if the machine fails all services are started on the other machine) or you can have (active/inactive) where one machine is in stand by just waiting for the other to fail and then it starts. When you have a machine failing, you have to absolutly make sure that machine is failing, otherwise you can have two machines up and running the same service with most surely the same IP. For this cases pacemaker has stonith (shoot the other node in the head), it is a mecanism to make sure the other machine is stopped before taking over its services.

Quote:

4. Do I need to change the file system to OCF RAs? Can I do it without formating my current file system? (LVM/ext3)

That question was answered in the 2nd one. The ocf scripts are in /var/lib/ocf.

Quote:

5. Otehr than server communication I need to configure logical address for both servers for user access? If yes, how to do it and when?

Usually (what we have here with AIX machines and what I have done in my linux cluster) is to have a service IP and a maintenance IP. One IP is static for the machine and is the one you will use to access for admin tasks, and the other is the one that jumps from node to node when one of them fails.
You will set your maintenance IP as you allways do in Linux, and use the script provided by heartbeat /usr/lib/ocf/resources.d/heartbeat/IPaddr to let the cluster assign the right IP to the right machine.
You will have to configure cib to use that script the following way:

Code:

<resources>

  <primitive class="ocf" id="IPCaronte" provider="heartbeat" type="IPaddr"> #this is one node

    <instance_attributes id="IPCaronte_Attr">

          <nvpair id="IPCaronte-ip" name="ip" value="10.50.1.200"/>

          <nvpair id="IPCaronte_Attr-cidr_netmask" name="cidr_netmask" value="255.255.240.0"/>

    </instance_attributes>

    <operations>

          <op id="IPCaronte-startup" interval="0" name="monitor" timeout="90s"/>

          <op id="IPCaronte-start" interval="0" name="start" timeout="90s"/>

          <op id="IPCaronte-stop" interval="0" name="stop" timeout="90s"/>

    </operations>

  </primitive>

  <primitive class="ocf" id="IPMusicas" provider="heartbeat" type="IPaddr">  #this is the other one 

        <operations>

          <op id="IPMusicas-startup" interval="0" name="monitor" timeout="90s"/>

          <op id="IPMusicas-start" interval="0" name="start" timeout="90s"/>

          <op id="IPMusicas-stop" interval="0" name="stop" timeout="90s"/>

        </operations>

        <instance_attributes id="IPMusicas_Attr">

          <nvpair id="IPMusicas-ip" name="ip" value="10.50.1.199"/>

          <nvpair id="IPMusicas_Attr-cidr_netmask" name="cidr_netmask" value="255.255.240.0"/>

        </instance_attributes>

  </primitive>

</resources>

<rsc_location rsc="IPMusicas" score="0" id="MusicasEnCaronte" node="caronte"/>

<rsc_location rsc="IPCaronte" score="INFINITY" id="CaronteEnCaronte" node="caronte"/>

<rsc_location rsc="IPCaronte" score="0" id="CaronteEnMusicas" node="Musicas"/>

<rsc_location rsc="Musica" score="INFINITY" id="MusicaEnMusicas" node="Musicas"/>

The first lines specify the script to be run and the parameters.
The last four lines specify where each IP should run and when. 0 is maximum priority and INFINITY is never to use that resource (unless the machine goes down).

Quote:

6. When the quoram comes in to picture? When I need to configure it?

You don't have to worry about quorum, the nodes elect one as DC.

I hope I have cleared something, but as I said at the beginning, the documentation will save you A LOT of pain.

Thanks. It was very helpful.
Also still I am not certain about the file system, even if I configure it in /usr/lib/ocf/resource.d, do I need to change the file system of my data base to OCF? currently ext3.
Can you please provide me the steps you have gone through to configure your cluster (with simplest form as I am a beginner in linux ).

regards,

ccilleperuma

I'm glad I could help ;)

What do you mean 'change the file system of my data base to OCF'?

OCF is not a file system, is a standard for scripts, I mean, you have to create a script compliant to OCF if you want it to run in your cluster, OCF seems to be an LSB upgrade for cluster scripts (/etc/init.d files are an example of LSB scripts) they should accept a start, stop, reload, etc... As you can see here pacemaker accepts 3 type of scripts, those compatible with LSB, those with OCF and legacy heartbeat resource agents (eventhough I think the only ones that worked for me were OCF compliant ones).

In a quick summary, the steps to follow to start a simple cluster are:

1) Create your /etc/ha.d/ha.cf file with the options you want in both machines (or in the number of nodes you will be running).

Here is an example o my ha.cf:

Code:

logfacility local6

keepalive 500ms

warntime 2

deadtime 4

initdead 10

bcast eth1 eth0

udpport 694

node Musicas Caronte

auto_failback on

ping 10.50.1.1

compression bz2

compression_threshold 2

crm respawn

autojoin any

Check with the manual what does what to adjust the parameters to your needs.

2)You will have to add to your node the piece of configuration I posted in the previous post. That is the simpliest node configuration, a cluster where the IP changes from one node to another. Use cibadmin for instance to load the configuration to your node.
3)Start your node with /etc/init.d/heartbeat start (I'm running it on debian). Check with crm_mon the status of the cluster and wait for it to set BOTH service IPs in your machine (note you have a one node cluster for now).
4)Start the second node. If ha.cf is correct in both computers, it is supposed to conect to the cluster and receive the configuration, so as soon as it starts one of the IPs should disappear from first node and go to the second one. Use crm_mon to see what is happening with the cluster.

Of course, you will have to change node names, IPs and masks in the configuration files for your own setup.

I almost forgot. If you have a shared resource in your network and you mount it on /usr/lib/ocf/resources.d on both machines, the same scripts will be accessed. Otherwise, you will have to copy the OCF scripts you create in both machines. Be aware that those scripts a crucial element in your cluster, so if access to that shared resource is lost, your cluster is doomed. Find a way to replicate those files and have them in local on both machines.

EDIT NOTE: bcast should have both interfaces (in /etc/ha.d/ha.cf). Otherwise, if the link between both machines goes down, both machines will start all services, and we don't want that, if direct link goes down BUT both machines have network connection, cluster should work normally (except for the error message in the log).

I started with a two PC's for testing purposes before touching the production server. I have another point to clear. It says STONITH is usually implemented as a remote power switch. So do I have to purchase it? or can I implement it as a service in non-cluster server? What are the risks for not implementing it? Is there any other method to do it?

Regards,

ccilleperuma.

I don't know much about stonith, I have it disabled since my cluster was a test project and the services running there are not critical for the company. As far as I can recall stonith needs some kind of hardware because it is a mechanism to shut down the machine physically. I can only point you to the documentation here.

About not implementing stonith.... The risk you will face depends on the services running on your nodes. As I stated before, you could end up with two machines running at the same time with the same IPs. You could also end up with two DB running at the same time on different machines with the same IPs, you can imagine the mess even the dissaster that could be.

It is up to you to decide how much you want to spend and what services are you going to run. Making sure you won't start both servers (or the services you want) at the same time can be enough, but.... ¿How can you assure your machine (or the service) is down before starting the other one? You are supposed to have two network connections on each machine, one straight from one machine to the other just for the heartbeat and another with the network, so if your network goes down, both machines will know it is a network problem and they are supposed to recognized that. If the link between both machines go down they are supposed to know it is the link, since they can access the network. I have not tested this thoroughly, so I can't tell if it is working fine. Line ping 10.50.1.1 in the ha.cf I posted is supposed to be checking network (it pings a gw in my case) the link I don't know.

You will sure have to create your own scripts (OCF compliant, remember) to start, stop and test the status of your resource. If those scripts are correct, they will make sure a service is stopped when it is supposed to, and notify otherwise in case it isn't. That will be enough for not having a service running on both nodes (from a software point of view, network problems are in another level).

If you want to be completely sure one node is not started unless the other is down, I guess you will have to get the hardware to setup stonith. Since you have two machines for testing, implement the simple cluster I pointed out on the other post and once it is running disconect one machine from the net, see how the cluster reacts. Then disconect the link and see how it reacts, shut one machine down, and again see how it reacts. After having the cluster configured correctly you will be able to decide if stonith is necessary for you or if it is not.

Regards.

I have configured the cluster in test environment. But still not clear about the ip config. this is what i have done.

configured local lan ip's for 2 nic's (192.100.100.70/71)
configured ip's for 2 NIC's (192.168.30.70/71) and connected them through cross cable.
now both can ssh each other without password. hostfile has only cluster connectivity IP's written.
Then when I started cluster config with heartbeat its askeg for following and find my config also.

Communication channels
Both given 192.168.30.0
Multicast address
node 1: 226.0.1.5 Node2: 226.0.1.6 (not clear what this config helps??)
Mulicast port
Both given 5454
Node ID
Node1: 1, Node2: 2

still both servers say they are the Dc and other shows as offline, please advice me on this.

ps:- I didnt configure any logical ip's yet.

regards,

ccilleperuma.

Hello again.

Could you please post your /etc/ha.d/ha.cf and your /etc/hosts? Could you also post your crm_mon -1 output?

I assume you have IPs 192.100.100.70/71 in eth0 and 192.168.30.70/71 in eth1. Am I right?

I also assume you are using heartbeat/pacemaker, therefore you have installed the packages pointed out here (just to make sure we have the same)

Hi,

Sorry for the replying delay, I had another issue to solve.

In my cluster servers there is no ha.cf probably because of following

Code:

Ultimately it will change in SLES11, HA will be replaced with OpenAIS and follow the same packaging and naming convention according to the recent changes in the project.

and the openais.conf file says

Code:

# This configuration file is not used any more

# Please refer to /etc/corosync/corosync.conf

So I list the corosync file here of cluster1

Code:

aisexec {

        #Group to run aisexec as. Needs to be root for Pacemaker



        group:        root



        #User to run aisexec as. Needs to be root for Pacemaker



        user:        root



}

service {

        #Default to start mgmtd with pacemaker



        use_mgmtd:        yes



        ver:        0



        name:        pacemaker



}

totem {

        #The mode for redundant ring. None is used when only 1 interface specified, otherwise, only active or passive may be choosen



        rrp_mode:        none



        #How long to wait for join messages in membership protocol. in ms



        join:        60



        #The maximum number of messages that may be sent by one processor on receipt of the token.



        max_messages:        20



        #The virtual synchrony filter type used to indentify a primary component. Change with care.



        vsftype:        none



        #The fixed 32 bit value to indentify node to cluster membership. Optional for IPv4, and required for IPv6. 0 is reserved for other usage



        nodeid:        1



        #How long to wait for consensus to be achieved before starting a new round of membership configuration.



        consensus:        4000



        #HMAC/SHA1 should be used to authenticate all message



        secauth:        on



        #How many token retransmits should be attempted before forming a new configuration.



        token_retransmits_before_loss_const:        10



        #How many threads should be used to encypt and sending message. Only have meanings when secauth is turned on



        threads:        1



        #Timeout for a token lost. in ms



        token:        3000



        #The only valid version is 2



        version:        2



        interface {

                #Network Address to be bind for this interface setting



                bindnetaddr:        192.168.30.0



                #The multicast address to be used



                mcastaddr:        226.0.1.5



                #The multicast port to be used



                mcastport:        5454



                #The ringnumber assigned to this interface setting



                ringnumber:        0



        }

        #To make sure the auto-generated nodeid is positive



        clear_node_high_bit:        no



}

logging {

        #Log to a specified file



        to_logfile:        no



        #Log to syslog



        to_syslog:        yes



        #Whether or not turning on the debug information in the log



        debug:        off



        #Log timestamp as well



        timestamp:        on



        #Log to the standard error output



        to_stderr:        yes



        #Logging file line in the source code as well



        fileline:        off



        #Facility in syslog



        syslog_facility:        daemon



}

amf {

        #Enable or disable AMF 



        mode:        disable



}

hosts file of cluster1

Code:

#

# hosts        This file describes a number of hostname-to-address

#              mappings for the TCP/IP subsystem.  It is mostly

#              used at boot time, when no name servers are running.

#              On small systems, this file can be used instead of a

#              "named" name server.

# Syntax:

#    

# IP-Address  Full-Qualified-Hostname  Short-Hostname

#



127.0.0.1      localhost



# special IPv6 addresses

::1            localhost ipv6-localhost ipv6-loopback



fe00::0        ipv6-localnet



ff00::0        ipv6-mcastprefix

ff02::1        ipv6-allnodes

ff02::2        ipv6-allrouters

ff02::3        ipv6-allhosts

127.0.0.2      cluster1.cbl cluster1

192.168.30.71      cluster2.cbl cluster2

192.168.30.70  cluster1.cbl cluster1

hosts file of cluster2

Code:

#

# hosts        This file describes a number of hostname-to-address

#              mappings for the TCP/IP subsystem.  It is mostly

#              used at boot time, when no name servers are running.

#              On small systems, this file can be used instead of a

#              "named" name server.

# Syntax:

#    

# IP-Address  Full-Qualified-Hostname  Short-Hostname

#



127.0.0.1      localhost



# special IPv6 addresses

::1            localhost ipv6-localhost ipv6-loopback



fe00::0        ipv6-localnet



ff00::0        ipv6-mcastprefix

ff02::1        ipv6-allnodes

ff02::2        ipv6-allrouters

ff02::3        ipv6-allhosts

127.0.0.2      cluster2.cbl cluster2

192.168.30.70      cluster1.cbl cluster1

192.168.30.71  cluster2.cbl cluster2

This is the output of crm_mon -1
Cluster1

Code:

============

Last updated: Sat Nov 19 02:07:42 2011

Stack: openais

Current DC: cluster1 - partition WITHOUT quorum

Version: 1.1.2-2e096a41a5f9e184a1c1537c82c6da1093698eb5

2 Nodes configured, 2 expected votes

0 Resources configured.

============



Node cluster2: UNCLEAN (offline)

Online: [ cluster1 ]

Cluster2

Code:

Last updated: Sat Nov 19 02:07:35 2011

Stack: openais

Current DC: cluster2 - partition WITHOUT quorum

Version: 1.1.2-2e096a41a5f9e184a1c1537c82c6da1093698eb5

2 Nodes configured, 2 expected votes

0 Resources configured.

============



Node cluster1: UNCLEAN (offline)

Online: [ cluster2 ]

I have e0 of both servers configured to 192.100.100.70 & 71 which connects to the out side network
and e1 of both configured as 192.168.30.70 & 71 and connect through a cross cable

Yes, I have installed heartbeat and peacemaker both but not yet configured the peacemaker as i want to check the cluster connectivity first
In heartbeat communication channels, I have given 192.168.30.0 as Bind network address as it only shows subnets (192.100.100.0/192.168.30.0) to select.
Also it asks for mulicats address and port which i assigned 226.0.1.5:5454 and 226.0.1.6:5454 respectively
Cluster1 node id is 1 and cluster2 is 2
rrp mode none for both as i dont have redundant channels

I hope this will help you to get an idea of my setup. Thanks very much for the interest you shown in this issue.

ps: there is a few seconds difference between the servers whic can show 1 min difference some times.

Regards,

ccilleperuma.

Well, first of all, eventhough you installed pacemaker, you seem to be using OpenAIS/corosync instead of pacemaker.

Either you continue with corosync and set it up according to this link (I haven't used corosync so I can't help you with that). Or you uninstall corosync and use pacemaker. In my case (I'm on Debian) the packages that where needed where heartbeat, cluster-glue, pacemaker and their dependencies.

In case you decide to use pacemaker, you will need a /etc/ha.d/ha.cf file (Check my previous posts for reference, but installation should have generated a reference one) and a /etc/ha.d/authkeys (installation should have generated this one too). This last file can have just two lines containing

Code:

auth 2

2 sha1 your_passphrase_goes_here

In your case, your ha.cf should have the next lines changed from the post of my ha.cf

bcast e0 e1
node cluster1 cluster2

The first line is for the heartbeats. You want them to go through both interfaces (suppose your link is diconnected or one of those network cards goes down, your cluster will know the other node is alive because it answers through the network and no resorces will be reallocated (which is what should happen, you should only recive a dead link in the log).

According to the first line you posted, your distro seems to be replacing heartbeat with OpenAIS, if packages doesn't seem to work, go to the link of my previous post and install manually.

I know this is an old thread, but wanted to say I had the same issue (and beat my head for a long time). My cluster was having issues because there was configurations in two places. I was using SLES 11. Eventually I had a phone call with Novell/SUSE support and it took them awhile to until they finally found out my config was in two places. It wasn't affecting heartbeat, but it was affecting cluster function.

That's the problem with having so many different documents to try and piece together how to build a cluster. I was looking at both Novell's docs and ClusterLabs docs:

http://clusterlabs.org/quickstart-suse.html
https://www.suse.com/documentation/s...ook_sleha.html

If you're interested in my outline of steps I finally put together for myself, that's here:
http://geekswing.com/geek/building-a...ter-on-vmware/

Turned out to be much easier than I thought just using the SLES gui instead of working from command line.