Linux - ServerThis forum is for the discussion of Linux Software used in a server related context.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
I am new to this forum, the reason why i am in this forum now is because of a production server cluster related issue that makes me commpletly disturbed.I am new to Redhat cluster as well.
This is a 2 node cluster,Operating system installed on these node is RHEL 5.3.
The system was running fine until last week, well things changed all of a sudden by making one of the node(node2) in 2 node cluster offline.
All the cluster related services were hung and the server was in a state not to reboot.I had to kill rgmmanager service to reboot the server, however the system rebooted and came up in cluster mode which made the other node (node1) off-line.
All that i understood from this was the cluster was unable to keep both the nodes on-line simultaneously.The same happened when i rebooted the node1,which killed the node2 upon its reboot.
I have now kept the node2 down in order to run the production application installed in this server.
Looking forward to your valuable reply as this is a really concerned issue for me which is in production environment.
Logs from node1 when the node2 was booted into cluster is pasted here for your ready reference.
MESSAGE FILE OUTPUT
---------------------
Feb 2 15:06:39 htbapp1 openais[3840]: [SYNC ] This node is within the primary component and will provide service.
Feb 2 15:06:39 htbapp1 kernel: Intel(R) Xeon(R) CPU E5520 @ 2.27GHz stepping 05
Feb 2 15:06:39 htbapp1 openais[3840]: [TOTEM] entering OPERATIONAL state.
Feb 2 15:06:39 htbapp1 kernel: Brought up 8 CPUs
Feb 2 15:06:39 htbapp1 openais[3840]: [MAIN ] Killing node htbapp2.ksebnet.com because it has rejoined the cluster with existing state
Feb 2 15:06:39 htbapp1 kernel: testing NMI watchdog ... OK.
Feb 2 15:06:40 htbapp1 kernel: time.c: Using 14.318180 MHz WALL HPET GTOD HPET/TSC timer.
Feb 2 15:06:40 htbapp1 kernel: time.c: Detected 2266.835 MHz processor.
Nopes.The application running in this server is jboss, is a production system.
Never matter what application is running in the cluster, issue which makes me paranoid is with cluster processes that tends not to work simultaneously on both the nodes.
I have setup a jboss server in RHEL5 x86_64 but it has been a couple of years, if I remember correctly it was challenging as the setup was pretty complex.
Do you have support with Red Hat they have jboss support, when I first started down this path I had to use Red Hat support since it was new (to me) and the company.
I have setup a jboss server in RHEL5 x86_64 but it has been a couple of years, if I remember correctly it was challenging as the setup was pretty complex.
Do you have support with Red Hat they have jboss support, when I first started down this path I had to use Red Hat support since it was new (to me) and the company.
The reason why i posted this thread here is coz the support with RHEL has been expired on last Nov and this problem was happened on last month. So obviously i had to seek help from linux experts who is playing right here. This seems to be a cluster BUG and i have no idea how to get rid of this.
If it is a bug, could you migrate over to CentOS with your existing config's where it is possible to download updates.
This way you could work towards a problem resolution if you cannot download updates, just something to throw out there.
As with any software clustering suites, they can be very complex and you may have to break down and purchase support if it is a production system. You have to weigh the cost of being down vs. paying for 1 year to get the help on it.
If it is a bug, could you migrate over to CentOS with your existing config's where it is possible to download updates.
This way you could work towards a problem resolution if you cannot download updates, just something to throw out there.
As with any software clustering suites, they can be very complex and you may have to break down and purchase support if it is a production system. You have to weigh the cost of being down vs. paying for 1 year to get the help on it.
since this is a production system, I cannot go for os switch. I would possibly convince my manager to go for support renewal. But i wonder if I could get the right resolution method from here.
This issue has been resolved !!! The culprit was "acpid" (power management)daemon that is not supposed to be running in cluster which caused the cluster nodes to mal-function. cluster started working perfect after the acpid daemon stopped in the startup.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.