LinuxQuestions.org
Latest LQ Deal: Latest LQ Deals
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Server
User Name
Password
Linux - Server This forum is for the discussion of Linux Software used in a server related context.

Notices


Reply
  Search this Thread
Old 07-08-2008, 07:58 AM   #1
your_shadow03
Senior Member
 
Registered: Jun 2008
Location: Germany
Distribution: Slackware
Posts: 1,466
Blog Entries: 6

Rep: Reputation: 51
Help on Clustering???


I have installed Red Hat Cluster Suite Packages on RHEL4.0 Update 2.I am in verse to setup Two Node Cluster.All I am attempting is:1
I have two MAchine 10.14.236.108(BL01DL385) and 10.14.236.106(BL02DL385).I have added nodes,failover domain,service and resources too.
My Cluster.conf file on both the server is:
Code:
<?xml version="1.0" ?>
<cluster alias="Test_Cluster" config_version="17" name="Test_Cluster">
        <fence_daemon post_fail_delay="0" post_join_delay="3"/>
        <clusternodes>
                <clusternode name="BL02DL385" votes="1">
                        <fence/>
                </clusternode>
                <clusternode name="BL01DL385" votes="1">
                        <fence/>
                </clusternode>
        </clusternodes>
        <cman expected_votes="1" two_node="1"/>
        <fencedevices/>
        <rm>
                <failoverdomains/>
                <resources>
                        <script file="/home/fsadmin/featureserver/scripts/featureserver.sh" name="featureserver.sh"/>
                        <ip address="10.14.236.200" monitor_link="1"/>
                </resources>
                <service autostart="0" name="MPAPP" recovery="relocate">
                        <script ref="featureserver.sh"/>
                        <ip ref="10.14.236.200"/>
                </service>
        </rm>
</cluster>
Now I ran:

On 10.14.236.106:

#service ccsd start
#Service cman start

I can see now CLuster Management option on #system-config-cluster GUI but I dont even see the next cluster online.So What I did is Transported manually to the other (108) system
Now I have same entry on both the server which you see above.

But the issue is still both the machine dint detect each other.
I am ignoring fetching for sometime now as I want just one script to be run if the primary server(106) goes down.

How Should I troubleshoot the same?
 
Old 07-08-2008, 09:27 AM   #2
your_shadow03
Senior Member
 
Registered: Jun 2008
Location: Germany
Distribution: Slackware
Posts: 1,466

Original Poster
Blog Entries: 6

Rep: Reputation: 51
Whenever I am attempting to Start cman service it shows ok but log shows:
Code:
Last login: Tue Jul  8 19:56:33 2008 from 10.14.2.254
[root@BL01DL385 ~]# tail -f /var/log/messages
Jul  8 19:57:13 BL01DL385 kernel: clustat[10995]: segfault at 0000000000000100 rip 00000000004036b7 rsp 0000007fbffffbc0 error 4
Jul  8 19:57:15 BL01DL385 kernel: WCI_nim[12614]: segfault at 00000000007d3000 rip 000000378dd716b7 rsp 0000000041400938 error 4
Jul  8 19:57:16 BL01DL385 kernel: WCI_nim[13391]: segfault at 00000000007d3000 rip 000000378dd716b7 rsp 0000000041400938 error 4
Jul  8 19:57:17 BL01DL385 kernel: WCI_nim[14862]: segfault at 00000000007d3000 rip 000000378dd716b7 rsp 0000000041400938 error 4
Jul  8 19:57:18 BL01DL385 ccsd[22078]: Cluster is not quorate.  Refusing connection.
Jul  8 19:57:18 BL01DL385 ccsd[22078]: Error while processing connect: Connection refused
Jul  8 19:57:18 BL01DL385 kernel: clustat[15033]: segfault at 0000000000000100 rip 00000000004036b7 rsp 0000007fbffffbc0 error 4
Jul  8 19:57:18 BL01DL385 crond(pam_unix)[844]: session closed for user smsafe
Jul  8 19:57:18 BL01DL385 crond(pam_unix)[852]: session closed for user mni
Jul  8 19:57:20 BL01DL385 kernel: WCI_nim[16560]: segfault at 00000000007d3000 rip 000000378dd716b7 rsp 0000000041400938 error 4
Jul  8 19:57:23 BL01DL385 ccsd[22078]: Cluster is not quorate.  Refusing connection.
Jul  8 19:57:23 BL01DL385 ccsd[22078]: Error while processing connect: Connection refused
Jul  8 19:57:23 BL01DL385 kernel: clustat[19005]: segfault at 0000000000000100 rip 00000000004036b7 rsp 0000007fbffffbc0 error 4
Jul  8 19:57:24 BL01DL385 kernel: WCI_nim[19844]: segfault at 00000000007d3000 rip 000000378dd716b7 rsp 0000000041400938 error 4
Jul  8 19:57:25 BL01DL385 kernel: CMAN: forming a new cluster
Jul  8 19:57:25 BL01DL385 kernel: CMAN: quorum regained, resuming activity
Jul  8 19:57:25 BL01DL385 ccsd[22078]: Cluster is quorate.  Allowing connections.
Jul  8 19:57:25 BL01DL385 cman: startup succeeded
Jul  8 19:57:29 BL01DL385 kernel: WCI_nim[24011]: segfault at 00000000007d3000 rip 000000378dd716b7 rsp 0000000041400938 error 4
Jul  8 19:57:30 BL01DL385 kernel: WCI_nim[24754]: segfault at 00000000007d3000 rip 000000378dd716b7 rsp 0000000041400938 error 4
Jul  8 19:57:31 BL01DL385 kernel: WCI_nim[26023]: segfault at 00000000007d3000 rip 000000378dd716b7 rsp 0000000041400938 error 4
Jul  8 19:57:34 BL01DL385 kernel: WCI_nim[27930]: segfault at 00000000007d3000 rip 000000378dd716b7 rsp 0000000041400938 error 4
Jul  8 19:57:36 BL01DL385 kernel: WCI_nim[29533]: segfault at 00000000007d3000 rip 000000378dd716b7 rsp 0000000041400938 error 4
 
Old 07-08-2008, 09:34 AM   #3
your_shadow03
Senior Member
 
Registered: Jun 2008
Location: Germany
Distribution: Slackware
Posts: 1,466

Original Poster
Blog Entries: 6

Rep: Reputation: 51
And The Output Logs during the rgmanager startup is:
Code:
gin as: root
root@10.14.236.108's password:
Last login: Tue Jul  8 20:05:43 2008 from 10.14.2.254
[root@BL01DL385 ~]# tail -f /var/log/messages
Jul  8 20:06:17 BL01DL385 ccsd[1332]: Unable to write package back to sender: Broken pipe
Jul  8 20:06:17 BL01DL385 last message repeated 7253 times
Jul  8 20:06:17 BL01DL385 kernel: WCI_nim[7048]: segfault at 00000000007d3000 rip 000000378dd716b7 rsp 0000000041400938 error 4
Jul  8 20:06:17 BL01DL385 ccsd[1332]: Unable to write package back to sender: Broken pipe
Jul  8 20:06:18 BL01DL385 last message repeated 33298 times
Jul  8 20:06:18 BL01DL385 crond(pam_unix)[27724]: session closed for user smsafe
Jul  8 20:06:18 BL01DL385 ccsd[1332]: Unable to write package back to sender: Broken pipe
Jul  8 20:06:18 BL01DL385 last message repeated 791 times
Jul  8 20:06:18 BL01DL385 crond(pam_unix)[27736]: session closed for user mni
Jul  8 20:06:18 BL01DL385 ccsd[1332]: Unable to write package back to sender: Broken pipe
Jul  8 20:06:19 BL01DL385 last message repeated 23039 times
Jul  8 20:06:19 BL01DL385 kernel: WCI_nim[8666]: segfault at 00000000007d3000 rip 000000378dd716b7 rsp 0000000041400938 error 4
Jul  8 20:06:19 BL01DL385 ccsd[1332]: Unable to write package back to sender: Broken pipe
Jul  8 20:06:20 BL01DL385 last message repeated 26664 times
Jul  8 20:06:20 BL01DL385 kernel: WCI_nim[9357]: segfault at 00000000007d3000 rip 000000378dd716b7 rsp 0000000041400938 error 4
Jul  8 20:06:20 BL01DL385 ccsd[1332]: Unable to write package back to sender: Broken pipe
Jul  8 20:06:21 BL01DL385 last message repeated 23196 times
Jul  8 20:06:21 BL01DL385 kernel: WCI_nim[9980]: segfault at 00000000007d3000 rip 000000378dd716b7 rsp 0000000041400938 error 4
Jul  8 20:06:21 BL01DL385 ccsd[1332]: Unable to write package back to sender: Broken pipe
Jul  8 20:06:21 BL01DL385 last message repeated 6593 times
Jul  8 20:06:21 BL01DL385 kernel: WCI_nim[10169]: segfault at 00000000007d3000 rip 000000378dd716b7 rsp 0000000041400938 error 4
Jul  8 20:06:21 BL01DL385 ccsd[1332]: Unable to write package back to sender: Broken pipe
Jul  8 20:06:22 BL01DL385 last message repeated 8330 times
Jul  8 20:06:22 BL01DL385 kernel: WCI_nim[10467]: segfault at 00000000007d3000 rip 000000378dd716b7 rsp 0000000041400938 error 4
Jul  8 20:06:22 BL01DL385 ccsd[1332]: Unable to write package back to sender: Broken pipe
Jul  8 20:06:22 BL01DL385 last message repeated 4529 times
Jul  8 20:06:22 BL01DL385 kernel: WCI_nim[10628]: segfault at 00000000007d3000 rip 000000378dd716b7 rsp 0000000041400938 error 4
Jul  8 20:06:22 BL01DL385 ccsd[1332]: Unable to write package back to sender: Broken pipe
Jul  8 20:06:23 BL01DL385 last message repeated 25850 times
Jul  8 20:06:23 BL01DL385 kernel: WCI_nim[11323]: segfault at 00000000007d3000 rip 000000378dd716b7 rsp 0000000041400938 error 4
Jul  8 20:06:23 BL01DL385 ccsd[1332]: Unable to write package back to sender: Broken pipe
Jul  8 20:06:23 BL01DL385 last message repeated 2237 times
Jul  8 20:06:23 BL01DL385 kernel: WCI_nim[11404]: segfault at 00000000007d3000 rip 000000378dd716b7 rsp 0000000041400938 error 4
Jul  8 20:06:23 BL01DL385 ccsd[1332]: Unable to write package back to sender: Broken pipe
Jul  8 20:06:24 BL01DL385 last message repeated 21149 times
Jul  8 20:06:24 BL01DL385 kernel: WCI_nim[11960]: segfault at 00000000007d3000 rip 000000378dd716b7 rsp 0000000041400938 error 4
Jul  8 20:06:24 BL01DL385 ccsd[1332]: Unable to write package back to sender: Broken pipe
Jul  8 20:06:28 BL01DL385 last message repeated 102989 times
Jul  8 20:06:28 BL01DL385 kernel: WCI_nim[15019]: segfault at 00000000007d3000 rip 000000378dd716b7 rsp 0000000041400938 error 4
Jul  8 20:06:28 BL01DL385 ccsd[1332]: Unable to write package back to sender: Broken pipe
Jul  8 20:06:29 BL01DL385 last message repeated 17279 times
Jul  8 20:06:29 BL01DL385 kernel: WCI_nim[15509]: segfault at 00000000007d3000 rip 000000378dd716b7 rsp 0000000041400938 error 4
Jul  8 20:06:29 BL01DL385 ccsd[1332]: Unable to write package back to sender: Broken pipe
Jul  8 20:06:29 BL01DL385 last message repeated 16346 times
Jul  8 20:06:29 BL01DL385 kernel: WCI_nim[15931]: segfault at 00000000007d3000 rip 000000378dd716b7 rsp 0000000041400938 error 4
Jul  8 20:06:29 BL01DL385 ccsd[1332]: Unable to write package back to sender: Broken pipe
Jul  8 20:06:30 BL01DL385 last message repeated 18077 times
Jul  8 20:06:30 BL01DL385 kernel: WCI_nim[16398]: segfault at 00000000007d3000 rip 000000378dd716b7 rsp 0000000041400938 error 4
Jul  8 20:06:30 BL01DL385 ccsd[1332]: Unable to write package back to sender: Broken pipe
 
Old 07-08-2008, 03:42 PM   #4
xiaodown
Member
 
Registered: Jun 2003
Location: Virginia
Distribution: Redhat, Centos, Fedora
Posts: 37

Rep: Reputation: 15
I hate to be that guy, but I don't know a lot about this built in type clustering.

Is this high performance clustering (splitting one task up in paralell), or is this high availability clustering (lots of back end servers for increased capacity / redundancy)?

If it's the latter, I wouldn't mess with this stuff. I'd use (and do use) ipvs / keepalived, and linux heartbeat. An optimal setup would be 2x front end machines with a virtual IP that can float between them in the case of fail-over. If these are routers, they can share failover responsibility with VRRP. Otherwise, linux-ha is capable of starting and stopping services, as well as upping and downing IP's. Then, if you set up IPVS, you can have any number of "virtual servers" listening on a port and forwarding to any number of "real servers" on the backend. Subsequently, you can set up keepalived to monitor the health of the back-end "real servers" and insert/remove them from the ipvsadm tables.

That way, the back end machines don't need to be aware of each other, and the front end machines are redundant.

~X
 
Old 07-08-2008, 10:58 PM   #5
your_shadow03
Senior Member
 
Registered: Jun 2008
Location: Germany
Distribution: Slackware
Posts: 1,466

Original Poster
Blog Entries: 6

Rep: Reputation: 51
Its Just Testing Stuff I am performing.Its Not the latter one(Load Balancing).Just we have a script which in case is running on active server should failover to the next if the first machine goes down.We have simply one application which is a script and the the clustering should work.
 
Old 07-09-2008, 12:16 AM   #6
your_shadow03
Senior Member
 
Registered: Jun 2008
Location: Germany
Distribution: Slackware
Posts: 1,466

Original Poster
Blog Entries: 6

Rep: Reputation: 51
Anyway I made it run since it has been missing perl-Crypt-SSLeay-0.51-5.x86_64.rpm Package.
Now the two nodes are online.What Next?
Should I reboot one server so that that script will run on the next server.
is it possible?
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
clustering >>? Ammad Linux - General 6 03-20-2008 09:28 PM
Clustering ZAMO Linux - Server 6 04-09-2007 10:38 AM
Clustering Atwin Fedora 1 04-04-2007 10:57 PM
which os for clustering junctionking Linux - Distributions 4 05-25-2005 03:03 PM
Clustering Stephanie General 5 01-29-2004 12:04 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Server

All times are GMT -5. The time now is 12:08 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration