LinuxQuestions.org
Review your favorite Linux distribution.
Go Back   LinuxQuestions.org > Forums > Enterprise Linux Forums > Linux - Enterprise
User Name
Password
Linux - Enterprise This forum is for all items relating to using Linux in the Enterprise.

Notices

Reply
 
Search this Thread
Old 02-04-2014, 08:48 PM   #1
iprince
LQ Newbie
 
Registered: Jun 2013
Posts: 9

Rep: Reputation: Disabled
Server Hung with error message: ERROR: Message hist queue is filling up


Hi Guys,

1st, please accept my poor english.

I have a cluster with 2 nodes running with RHEL + pacemaker.

Recently, one of the cluster node hung (pingable but blackscreen on KVM and ssh also was not possible). The other active cluster node was serving the mysql service but user reported that application couldn't access to the database.

I've no choice unless to reboot the system (to get back the console). After reboot, cluster back to normal.

Lot of ERROR messages popped out (every second) in the message log file (on impacted cluster node) as below:

ERROR: Message hist queue is filling up (500 messages in queue)
WARN: Gmain_timeout_dispatch: Dispatch function for send_reqnodes_msg took too long to execute: 240 ms (> 100 ms) (GSource: 0x9c3f660)

Could everyone advice what's went wrong on my system.
Also how relates the above messages to hung issue.

Your advice is highly appreciated. Thanks.
 
Old 02-05-2014, 12:24 AM   #2
myatthu
Member
 
Registered: Jan 2014
Distribution: CentOS, Fedora, Ubuntu
Posts: 108

Rep: Reputation: 17
You need to investigate following items:
1. Do you updated system in recent days? Especially, kernel
2. Do you have active support with RedHat?
3. if yes for #2, best to activate RedHat support.
 
Old 02-05-2014, 12:47 AM   #3
iprince
LQ Newbie
 
Registered: Jun 2013
Posts: 9

Original Poster
Rep: Reputation: Disabled
Hi myatthu,

Thanks for the feedback.

FYI, there was no maintenance/update activity carried out recently. Suddenly, this issue hit the cluster.
Second, the support contract (RHN) still valid and fyi, I've logged this issue to HP (since I bought the support from HP). HP did the troubleshooting at hardware and OS levels (found everything OK) but they can't assist me on the cluster level troubleshooting due to my cluster setup with 'pacemaker' instead of Red Hat cluster tool (luci-ricci). They have advised me to contact pacemaker support

I'm stuck and really hope that everyone in this forum can assist/advice me on this issue.

Thanks a lot.
 
Old 02-05-2014, 04:49 AM   #4
myatthu
Member
 
Registered: Jan 2014
Distribution: CentOS, Fedora, Ubuntu
Posts: 108

Rep: Reputation: 17
What is your RHEL and heartbeat version?
Code:
cat /etc/redhat-release
Code:
rpm -qi heartbeat
 
Old 02-05-2014, 07:13 PM   #5
iprince
LQ Newbie
 
Registered: Jun 2013
Posts: 9

Original Poster
Rep: Reputation: Disabled
Hi Myatthu,

Please refer below:

Quote:
Originally Posted by myatthu View Post
What is your RHEL and heartbeat version?
Code:
cat /etc/redhat-release
Code:
rpm -qi heartbeat
Red Hat Enterprise Linux Server release 6.2 (Santiago)

Name : heartbeat
Arch : x86_64
Version : 3.0.4
 
Old 02-05-2014, 08:45 PM   #6
myatthu
Member
 
Registered: Jan 2014
Distribution: CentOS, Fedora, Ubuntu
Posts: 108

Rep: Reputation: 17
Yeah your environment is pretty recent version.

Can you provide following outputs? You may omit real IP and secret key.

Code:
last
Code:
uname -a
Code:
cat /etc/ha.d/ha.cf
Code:
cat /etc/ha.d/haresources
I just want to sure that it it really cause by cluster package.
Can you also grep ERROR at /var/log?

Code:
grep -i error /var/log/*
 
Old 02-09-2014, 08:40 PM   #7
iprince
LQ Newbie
 
Registered: Jun 2013
Posts: 9

Original Poster
Rep: Reputation: Disabled
HI myatthu,

Sorry for the late reply.
I've collected all the output/logs as requested.
You may retrieve it from the below link:

https://www.dropbox.com/sh/k083f71s2aoe2f9/emmr3fCzbe

Appreciate of your advice.

Thanks.
 
Old 02-10-2014, 09:40 AM   #8
myatthu
Member
 
Registered: Jan 2014
Distribution: CentOS, Fedora, Ubuntu
Posts: 108

Rep: Reputation: 17
How do you connect two nodes? Is it direct cable or through routed network?
You might want to adjust following items to adjust if your network latency is high.
warntime 20
deadtime 30
initdead 30

You might want to check CPU usage at period. Again, I just guessing some possibility.
You should ask at heartbeat community for further details analysis.

Good luck.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] Error message with graphic card: i915_hangcheck_elapsed GPU hung peterlu Linux - Hardware 2 05-11-2013 12:34 PM
Cannot push with git. Error Message: "fatal: The remote end hung up unexpectedly" braclayrab Linux - Software 2 07-11-2008 01:41 PM
Error Message Concerning mail queue kaplan71 Linux - Software 1 11-28-2006 11:13 AM
Strange Repeating Error message in /var/log/message lucktsm Linux - Security 2 10-27-2006 08:29 AM
message logs filling up w/ wlan0 buffer too small error sordomudo11 Linux - Hardware 0 04-13-2004 09:13 PM


All times are GMT -5. The time now is 06:07 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration