LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Enterprise (https://www.linuxquestions.org/questions/linux-enterprise-47/)
-   -   Server Hung with error message: ERROR: Message hist queue is filling up (https://www.linuxquestions.org/questions/linux-enterprise-47/server-hung-with-error-message-error-message-hist-queue-is-filling-up-4175493748/)

iprince 02-04-2014 08:48 PM

Server Hung with error message: ERROR: Message hist queue is filling up
 
Hi Guys,

1st, please accept my poor english. :)

I have a cluster with 2 nodes running with RHEL + pacemaker.

Recently, one of the cluster node hung (pingable but blackscreen on KVM and ssh also was not possible). The other active cluster node was serving the mysql service but user reported that application couldn't access to the database.

I've no choice unless to reboot the system (to get back the console). After reboot, cluster back to normal.

Lot of ERROR messages popped out (every second) in the message log file (on impacted cluster node) as below:

ERROR: Message hist queue is filling up (500 messages in queue)
WARN: Gmain_timeout_dispatch: Dispatch function for send_reqnodes_msg took too long to execute: 240 ms (> 100 ms) (GSource: 0x9c3f660)

Could everyone advice what's went wrong on my system.
Also how relates the above messages to hung issue.

Your advice is highly appreciated. Thanks.

myatthu 02-05-2014 12:24 AM

You need to investigate following items:
1. Do you updated system in recent days? Especially, kernel
2. Do you have active support with RedHat?
3. if yes for #2, best to activate RedHat support.

iprince 02-05-2014 12:47 AM

Hi myatthu,

Thanks for the feedback.

FYI, there was no maintenance/update activity carried out recently. Suddenly, this issue hit the cluster.
Second, the support contract (RHN) still valid and fyi, I've logged this issue to HP (since I bought the support from HP). HP did the troubleshooting at hardware and OS levels (found everything OK) but they can't assist me on the cluster level troubleshooting due to my cluster setup with 'pacemaker' instead of Red Hat cluster tool (luci-ricci). They have advised me to contact pacemaker support :(

I'm stuck and really hope that everyone in this forum can assist/advice me on this issue.

Thanks a lot.

myatthu 02-05-2014 04:49 AM

What is your RHEL and heartbeat version?
Code:

cat /etc/redhat-release
Code:

rpm -qi heartbeat

iprince 02-05-2014 07:13 PM

Hi Myatthu,

Please refer below:

Quote:

Originally Posted by myatthu (Post 5111919)
What is your RHEL and heartbeat version?
Code:

cat /etc/redhat-release
Code:

rpm -qi heartbeat

Red Hat Enterprise Linux Server release 6.2 (Santiago)

Name : heartbeat
Arch : x86_64
Version : 3.0.4

myatthu 02-05-2014 08:45 PM

Yeah your environment is pretty recent version.

Can you provide following outputs? You may omit real IP and secret key.

Code:

last
Code:

uname -a
Code:

cat /etc/ha.d/ha.cf
Code:

cat /etc/ha.d/haresources
I just want to sure that it it really cause by cluster package.
Can you also grep ERROR at /var/log?

Code:

grep -i error /var/log/*

iprince 02-09-2014 08:40 PM

HI myatthu,

Sorry for the late reply.
I've collected all the output/logs as requested.
You may retrieve it from the below link:

https://www.dropbox.com/sh/k083f71s2aoe2f9/emmr3fCzbe

Appreciate of your advice.

Thanks.

myatthu 02-10-2014 09:40 AM

How do you connect two nodes? Is it direct cable or through routed network?
You might want to adjust following items to adjust if your network latency is high.
warntime 20
deadtime 30
initdead 30

You might want to check CPU usage at period. Again, I just guessing some possibility.
You should ask at heartbeat community for further details analysis.

Good luck.


All times are GMT -5. The time now is 01:18 AM.