LinuxQuestions.org
Visit Jeremy's Blog.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Enterprise Linux Forums > Linux - Enterprise
User Name
Password
Linux - Enterprise This forum is for all items relating to using Linux in the Enterprise.

Notices


Reply
  Search this Thread
Old 08-19-2010, 08:40 PM   #1
kumarlimbu
LQ Newbie
 
Registered: Aug 2010
Posts: 1

Rep: Reputation: 0
Cluster failure with error (unmanaged) FAILED


Hi,



We are using Linux HA to manage our cluster of 2 web servers.



Both web server are using idential software (OS, http, tomcat servers etc are all same) but different hardware. Both servers have 64-bit processor.



Following are the software being used:

1. CentOS 5.4

2. Pacemaker 1.0.5

3. OpenAIS 0.80

4. Cluster-glue 1.0-12

5. resource agents:- ocf, heartbeat



Under normal circumstances both the IPs are accessible and everything seems to be working well.



============

Last updated: Thu Aug 19 17:24:34 2010

Stack: openais

Current DC: server1 - partition with quorum

Version: 1.0.5-462f1569a43740667daf7b0f6b521742e9eb8fa7

2 Nodes configured, 2 expected votes

2 Resources configured.

============



Online: [ server1 server2 ]



ClusterIP1 (ocf::heartbeat:IPaddr2): Started server1

ClusterIP2 (ocf::heartbeat:IPaddr2): Started server2





We need to copy new/updated files to our servers periodically and during this operation the server becomes slow. So when the file is being copied on server1, we change it to standby mode by issuing the command

- crm node standby



Output of crm_mon command during this time:

============

Last updated: Thu Aug 19 17:43:14 2010

Stack: openais

Current DC: server1 - partition with quorum

Version: 1.0.5-462f1569a43740667daf7b0f6b521742e9eb8fa7

2 Nodes configured, 2 expected votes

2 Resources configured.

============



Node server1: standby

Online: [ server2 ]



ClusterIP1 (ocf::heartbeat:IPaddr2): Started server2

ClusterIP2 (ocf::heartbeat:IPaddr2): Started server2





So during this time every request is being handled by server2. After the file is copied, we take it online using



- crm node online



This setting has been working well for us and the servers go to standby mode and comes back online without much issue. Recently we are seeing that one of the IPs becomes in accessible and it is always ClusterIP2. It won't return to normal until that server is restarted.



Output of the crm_mon command when the ClusterIP2 is inaccessible:

============

Last updated: Thu Aug 19 09:12:52 2010

Stack: openais

Current DC: server1 - partition with quorum

Version: 1.0.5-462f1569a43740667daf7b0f6b521742e9eb8fa7

2 Nodes configured, 2 expected votes

2 Resources configured.

============



Online: [ server1 server2 ]



ClusterIP1 (ocf::heartbeat:IPaddr2): Started server1

ClusterIP2 (ocf::heartbeat:IPaddr2): Started server1 (unmanaged) FAILED



Failed actions:

ClusterIP2_stop_0 (node=server1, call=5400, rc=1, status=complete): unknown error





We are baffled because this problem is occuring with more regularity and we haven't modified any of the cluster settings.



- What usually causes one of the IP address to become inaccessible?

- Are settings could we change to avoid this situation in the future?



If more information regarding our configuration or logs are required, please let me know.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Redhat CLuster fence failed - problem aaron28 Linux - Server 7 03-23-2009 09:45 AM
Error:: Wired Connection...Device unmanaged Snehal.lamture Linux - Networking 1 02-27-2009 08:54 AM
Cluster accounting failed at 135593 (0x211a9): missing cluster in $Bitmap fakie_flip Linux - Software 1 01-02-2008 03:08 AM
Cisco 2950 Error: POST Failure: Ethernet Controller Test: Failed abefroman Linux - Networking 1 11-30-2007 03:14 AM
online_update failed - ERROR(Media:connection failed)[Connect failed] rover SUSE / openSUSE 8 02-22-2005 07:57 AM

LinuxQuestions.org > Forums > Enterprise Linux Forums > Linux - Enterprise

All times are GMT -5. The time now is 04:34 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration