LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 04-21-2013, 05:42 AM   #1
shalvie
LQ Newbie
 
Registered: Apr 2013
Posts: 11

Rep: Reputation: Disabled
Issues with DRBD/Pacemaker/CMAN/MySQL setup on RHEL 6.1 using ccs tool


Hello,

Been having some issues making anything really happen with this setup. There isn't much to read in regards issues with this particular combination of software.

I am using:
RHEL 6.1 x32
DRBD 8.4.1
Pacemaker 1.1.8-7
CMAN 3.0.12.1
ccs 0.16.2
mysql-5.5.28

Basically I have configured this setup based on the DRBD user guide (including the bit about the cluster.conf file ,http://www.drbd.org/users-guide/s-rh...-clusters.html) pacemaker RHEL 6 quickstart guide (http://clusterlabs.org/quickstart-redhat.html)and changed my config a bit to match this suggestion (https://github.com/rozofs/rozofs/wik..._on_centos_6.3).

My /etc/drbd.d/mysql.res looks as follows:



# You can find an example in /usr/share/doc/drbd.../drbd.conf.example
resource mysql {

net {
# Automatic split brain recovery policies
after-sb-0pri discard-zero-changes;
after-sb-1pri discard-secondary;
after-sb-2pri call-pri-lost-after-sb;
}

syncer {
verify-alg sha1;
}


on shalva1 {
device /dev/drbd0;
disk /dev/system/mysql_drbd;
address 172.16.101.249:7788;
meta-disk internal;
}

on shalva2 {
device /dev/drbd0;
disk /dev/system/mysql_drbd;
address 172.16.101.250:7788;
meta-disk internal;

}

DRBD Status outputs:

shalva1
[root@shalva1 home]# service drbd status
drbd driver loaded OK; device status:
version: 8.4.1 (api:1/proto:86-100)
GIT-hash: 91b4c048c1a0e06777b5f65d312b38d47abaea80 build by gardner@, 2012-05-24 20:42:05
m:res cs ro ds p mounted fstype
0:mysql WFConnection Primary/Unknown UpToDate/DUnknown C

shalva2


[root@shalva2 ~]# service drbd status
drbd driver loaded OK; device status:
version: 8.4.1 (api:1/proto:86-100)
GIT-hash: 91b4c048c1a0e06777b5f65d312b38d47abaea80 build by gardner@, 2012-05-24 20:42:05
m:res cs ro ds p mounted fstype
0:mysql StandAlone Secondary/Unknown UpToDate/DUnknown r-----


my /etc/cluster/cluster.conf:



<cluster config_version="10" name="mysql">
<fence_daemon/>
<clusternodes>
<clusternode name="shalva1" nodeid="1">
<fence>
<method name="pcmk-redirect">
<device name="pcmk" port="shalva1"/>
</method>
</fence>
</clusternode>
<clusternode name="shalva2" nodeid="2">
<fence>
<method name="pcmk-redirect">
<device name="pcmk" port="shalva2"/>
</method>
</fence>
</clusternode>
</clusternodes>
<cman/>
<fencedevices>
<fencedevice agent="fence_pcmk" name="pcmk"/>
</fencedevices>
<rm>
<failoverdomains/>
<resources/>
<service autostart="1" name="mysql">
<drbd name="mysql" resource="mysql">
<fs device="/dev/drbd0"
mountpoint="/var/lib/mysql"
fstype="ext3"
name="mysql"
options="noatime"/>
</drbd>
<ip address="172.16.101.251" monitor_link="1"/>
<mysql config_file="/etc/my.cnf"
listen_address="172.16.101.251"
name="mysqld"/>
</service>
</rm>

</cluster>


The cluster wouldn't take ownership of the node so I added a virtual IP via the pcs tool because I didn't know how else to do it:

[root@shalva2 ~]# pcs status
Last updated: Sun Apr 21 10:38:36 2013
Last change: Sat Apr 20 22:45:44 2013 via crmd on shalva1
Stack: cman
Current DC: shalva2 - partition with quorum
Version: 1.1.8-7.el6-394e906
2 Nodes configured, unknown expected votes
1 Resources configured.


Online: [ shalva1 shalva2 ]

Full list of resources:

ip_mysql (ocf::heartbeat:IPaddr2): Started shalva2



From my understanding after this setup the /dev/drbd0 should be mounted to /var/lib/mysql based on who owns the IP and everything should be happy but I am pretty sure I am missing something and I am not 100% sure what that is. If anyone could give some advice, point me in the right direction or point me in the direction of some good documentation I would greatly appreciate it.

Thanks.
 
Old 04-22-2013, 06:05 PM   #2
gdizzle
Member
 
Registered: Jul 2012
Posts: 234

Rep: Reputation: Disabled
Firstly you are trying to create an Active/Passive Cluster for MySQL using shared storage with DRBD.

Please when you post your XML do it in a neat way, as it's hard to read yet understand what you are doing use an online tool such as:
http://www.freeformatter.com/xml-formatter.html

Make sure mysql the service is stopped as the Cluster will manage this and make sure it's disabled at startup: chkconfig mysqld off

When I had both disks working for Primary and Secondary, using DRBD, I moved the MYSQL files to the mounted shared storage disk, and symlinked it so the system still knew it existed.

Code:
mv -v /var/lib/mysql /mnt/drbd/lamp
ln -s /mnt/drbd/lamp/mysql /var/lib
Paste your :
Code:
cat /etc/corosync/corosync.conf
What are you using as a fencing device ??

Do a
Code:
crm configure show
, and show your cluster config in a neat xml format, show the Pacemaker config.

Code:
tail -f /var/log/messages
, and move resources from one node in your cluster to the other, show us the errors?

Code:
crm resource move mysql_resource  shalva2
: moves all the services and the VIP to shalva2 from shalva1
 
Old 04-22-2013, 10:53 PM   #3
shalvie
LQ Newbie
 
Registered: Apr 2013
Posts: 11

Original Poster
Rep: Reputation: Disabled
Hi,

Thanks for the reply. First of all I didn't have crm, it was not available with my distro and what I found on pkgs.org was not compatible with the os and yum complained of many dependancies that weren't my repo so css and pcs were my only tool options. Pcs was not getting me anywhere. Also its not part of what is used for the redhat ha cluster at all according to my reading. It would seem that CentOS and Redhat were the only OSs that had to use this model as the pcs package was missing the pcsd which means you couldn't actually start the cluster with pcs. This took me about 4 hours to figure out. I was under a really tight deadline so I spent another 5 hours on it and I figured out how to get it up. I want to post it here in case any one else is having a similar issue.

By following pages that connected to each other via the 'See Also' section on http://linux.die.net I got css tool commands, I finally found the right command to use:

ccs -f /etc/cluster/cluster.conf -i --startall

I had to use the -i because it complained the cluster.conf was invalid yet gave no reason for saying that. /var/log/messages had no helpful output. It hadn't had more than two lines of output since I stopped using pcs.

Once I ran the correct command after cleaning up the pacemaker config file that I messed up via pcs I found it complianed that I was missing ricci so I downloaded ricci and tried again. This time it stated both my nodes were up and running but again nothing happens.

I kept following the 'see also' links and found what I was missing was rgmanager once I started that up my /var/log/messages finally filled up with messages about drbd. One of the things I saw in one of the many articles I read was making both drbd nodes secondary at which point rgmanager kicked in and shalva1 got the virtual ip, mounted /dev/drbd0 again and started up mysql.

It would seem dealing with the red hat cluster model is all about dependancy hell and lack of documentation about what exactly needs to be installed. The red hat documentation talks all about the Add-on but as I was using the centOS 6 base repo to get around the necessity for this add-on, per cluster.org's wiki, as they don't give it for free to students. I really had no step by step guide and had to collect crumbs to get to this point.

Sorry about the messy code, next time I know better.
 
Old 04-22-2013, 11:32 PM   #4
gdizzle
Member
 
Registered: Jul 2012
Posts: 234

Rep: Reputation: Disabled
I understand your pain with the dependencies , I got my cluster working in RHEL 6, as I didn't have the Cluster Subscription with RHN, I used the Scientific Linux repo to get CRM and all the packages installed.

http://ftp.scientificlinux.org/linux/scientific/6.4/

Either way glad you have it all working, as setting up a cluster was a journey for me too.

I hope you setup Fencing.

There are tonnes of examples online with STONITH Disabled and fencing left out, if you ever encounter "Split Brain" you will regret that you didn't set it up.
 
Old 04-23-2013, 02:06 AM   #5
shalvie
LQ Newbie
 
Registered: Apr 2013
Posts: 11

Original Poster
Rep: Reputation: Disabled
So with the whole fencing thing I am a little confused as to what my status is there.

I setup fencing according to this link:

http://clusterlabs.org/quickstart-redhat.html

It says you have to set it up even if you don't use it. Does that mean its activated unless I explicitly deactivated it? I haven't enabled stonith yet for testing. I am working on seeing the result of different types of failures and my results seem to be a bit disappointing actually. If the database process itself goes down it tries to bring it back up causing the virtual IP to disapper for 12 -1 3 seconds. The database is unavailable for about a minute but I haven't exported exact data to know exactly how many seconds for sure but based on doing a ps -ef it appears to be a minute. If I do something to. If it fails to bring mysql back up it fails over and takes another 12 -13 seconds to attach the IP again but its about 2 minutes of down time in 1 minute intervals.

The first major failure I have (DRBD failure, which is ridiculous difficult to force, or box disconncts, as of now I have just been rebooting) the IP never gets lost it fails over immediately after that it seems to take its time. When I shut down a box I get a quorum failure error and it takes extreme efforts to bring the box back into the cluster. If I only attempt to restart the node on the box that died (it isn't actually able to stop it but it tries and that seems to do somethign) the IP and mysql will failover but drbd won't start. Rgmanager comes up at startup but drbd, pacemaker and ricci don't so I need to manaually start it up. If I try to stop and start both nodes the disruption is over a minute but the failover is normal.

If both nodes fail (network failure) my database gets wiped.

Am I doing something very wrong? Because if I am not I am not sure how this can be called a HA solution.
 
Old 04-23-2013, 02:14 AM   #6
shalvie
LQ Newbie
 
Registered: Apr 2013
Posts: 11

Original Poster
Rep: Reputation: Disabled
Actually I take the drbd failure and data loss issue back. I triggered another failure a minute ago and I still have the data and when both boxes die the cluster seems to come up normally so for only a few seconds and then there is no longer a primary drbd node. A single node failure I am not recoving from well at all.
 
Old 04-23-2013, 05:21 PM   #7
gdizzle
Member
 
Registered: Jul 2012
Posts: 234

Rep: Reputation: Disabled
Ok this thread could go forever there is so much involved with clusters, not to mention there are different packages with the clusters.

Jump onto Freenode and hit up the IRC Channel #linux-cluster, and #linux-ha.

Talk to the folk there and ask some questions make sure you paste all configs, using fpaste or something similar.

You need to read all these guides: http://clusterlabs.org/doc/ back to front and then ask questions to the guys in the IRC Channel, trust me there is more than meets the eyes with these setups, and even when you think you have it right, you will learn something else new.
 
Old 04-23-2013, 09:13 PM   #8
shalvie
LQ Newbie
 
Registered: Apr 2013
Posts: 11

Original Poster
Rep: Reputation: Disabled
Have actually been cursing the day I decided an HA cluster would be an excellent idea to be part of my thesis

I have been sitting in the #linux-cluser channel since Sunday (15 hours a day) but no dice. Will try #lilnux-ha though.

I actually finally saw in /var/log/messages one of my tests that it wanted to activate STONITH but it wasn't defined so I am making slow progress. I read a lot of the cluster guides for CMAN and Pacemaker 1.1 but they use a lot of commands unavailable to me and I find their documentation to be cryptic where fencing is concerned but may talking to people directly would help.

Thanks for your help.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
pacemaker with drbd and lvm anilcliff Linux - Server 0 07-13-2012 12:53 PM
cluster (corosync, pacemaker, drbd, mysql) lost communication between nodes arrals.vl Linux - Server 2 05-10-2012 10:09 AM
MySQL HA-cluster with DRBD, Pacemaker and Corosync Patric.F Linux - Server 2 01-28-2012 05:27 AM
DRBD + GFS2 on Centos 5.4 cman problem smbdie Linux - Server 1 07-13-2011 01:06 AM
LXer: Installation And Setup Guide For DRBD, OpenAIS, Pacemaker + Xen On OpenSUSE 11. LXer Syndicated Linux News 0 08-19-2009 12:50 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 08:27 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration