LinuxQuestions.org
Go Job Hunting at the LQ Job Marketplace
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices

Reply
 
LinkBack Search this Thread
Old 04-02-2010, 03:56 AM   #1
eblonk
LQ Newbie
 
Registered: Dec 2009
Distribution: Kubuntu, Debian
Posts: 10

Rep: Reputation: 1
Nagios reports host as down, services as OK


Since a week or 2 Nagios is constantly marking hosts (servers mainly but also a few Serial-over-IP converters) down for anything up to a few minutes. Typically, all services stay in OK status. Looking closer at a host in such state, it's status information is

CRITICAL - Packet Filtered (<IP address of host in question>)

in soft state.


Sometimes services are in critical state with the host in OK status. Almost always the status information is "No route to host". Further checking shows no problems. Rarely this state lasts longer than one check interval.

This started after a link was down, putting, correctly, all hosts and services on red for being unreachable. The link problems were solved within a few hours but Nagios only showed this after 2 reboots. Since then the problems has lessened in frequency gradually, from 5 to 10 of the 34 hosts being reported down at any given moment (the same for the 150-ish services monitored) to where I am now, 1 to 5 problem statuses (counting both hosts and services) and the occasional 'all green' screen.
A week ago, when the problems were going a week already, Nagios updated from 3.2.0 to 3.2.1. This showed no apparent improvement.

Still, the Host Groups screen is not stable. A read or yellow status initially signified a problem to be looked at, right now it is likely a false alarm to will go away.

Whenever the "packet filtered" or "no route to host" is followed up by a ping test, no problems, not even with the slightest delay or packt loss are found.
 
Old 04-02-2010, 07:03 AM   #2
carltm
Member
 
Registered: Jan 2007
Location: Canton, MI
Distribution: CentOS, SuSE, Red Hat, Debian, etc.
Posts: 697

Rep: Reputation: 93
nagios is telling you what it is doing, namely it is running the check-host-alive test and something
is blocking the result from being returned (that's what is meant by Packet filtered). By default the
check-host-alive test is usually a ping. I'd double check what your check-host-alive command is and
then investigate why your network is apparently not allowing responses. You may need to set up more
advanced testing to see what is happening, since Nagios only checks on a schedule and doesn't let you
see what's going on in real time.

Incidentally, it makes sense that the services would appear as okay if pings are being intermittently
dropped, provided that the other tests don't rely on ping.
 
Old 04-02-2010, 07:07 AM   #3
never say never
Member
 
Registered: Sep 2009
Location: Indiana, USA
Distribution: SLES, SLED, OpenSuse, CentOS, ubuntu 10.10, OpenBSD, FreeBSD
Posts: 195

Rep: Reputation: 37
Sounds to me like there is a problem with the nagios plugins reaching the host(s) they are to check. Could there be a routing issue, perhaps a bad switch port, maybe even a bad nic or cable?? I bring this up as a possibility because you said there was a problem.

One other possible problem comes to mind. A high load of network traffic or CPU usage on the nagios box could cause packets to be received too late to count or not at all. Check the server load and the network load on both the Nagios Server AND the server that is reported.

If this is a fairly frequent problem creative use of tcpdump and grep might help locate the problem

Nagios schedules checks for services and hosts based on your configuration, this is why sometimes a host is marked down and the services are still marked as up (Host checks are just pings)

I have a Nagios Server that runs on a box that also does spam filtering. If I get slammed with mail, I will have these issues.
 
Old 04-02-2010, 07:57 AM   #4
eblonk
LQ Newbie
 
Registered: Dec 2009
Distribution: Kubuntu, Debian
Posts: 10

Original Poster
Rep: Reputation: 1
carltm -
I made an estimate. 181 hosts and services are monitored and the mentioned statuses are now at an average of 4. There is no host or service being constantly blocked. There is no particular host or service singled out. Having said that, checking the check-host-alive command might give a clue or at the very least I should see what it does.
Initially, after the down-and-up of the WAN link, Nagios kept reporting all host and services down. After one reboot the next morning a few came back again, a second reboot that afternoon brought all back but with 30% dropout. Since I did a few reboots but they had no immediate effect.

never say never -
The issue might indeed be on the box. The problems occur on hosts on the same subnet, on different subnets within the LAN and on subnets reached through a WAN connection. Also, hosts are device servers, Windows and Linux servers. In the case of the device servers (Quatech ESE-100D), there is no client and only ping and uptime are monitored. The problem is almost certainly just in the traffic between the Nagios box and the clients.
I can't imagine how the sequence of events would have triggered this but there might be something coincidental. I will check the general state of the Nagios box.
 
Old 12-23-2010, 06:59 AM   #5
PRO_wannabe
LQ Newbie
 
Registered: Dec 2010
Posts: 1

Rep: Reputation: 0
eblonk,

I have been seeing similar things regarding a couple of hosts that report to be down but as soon as I see this and check them by trying to manually ping or ssh to them they appear fine. The cpu load is not high and I don't see a lot of traffic.

I was just wondering if you ever came to any conclusions that you could pass on regarding your situation of hosts down but are actually up.

thanks,
Keith
 
Old 03-11-2011, 08:01 AM   #6
carltm
Member
 
Registered: Jan 2007
Location: Canton, MI
Distribution: CentOS, SuSE, Red Hat, Debian, etc.
Posts: 697

Rep: Reputation: 93
eblonk, are you still seeing the problem? I had forgotten about this
thread, and didn't suggest the next step. See if powercycling the
hubs and/or switches makes any difference. It is possible for some
switches to start acting weird after being on for a long time, especially
if certain network conditions exist.
 
Old 12-04-2011, 01:41 PM   #7
amar.ali
LQ Newbie
 
Registered: Dec 2011
Posts: 1

Rep: Reputation: Disabled
Hi,

I came through this post when googling my problem..

I have Nagios 3 installed on Centos 5, Nagios shows hosts down while they are up !

I believe that everything is just fine with my configurations.. I'm monitoring 3 hosts, 1 of them is on the same network that Nagios on and it Nagios can ping it. The others are monitored by Nagios through DLink router (dir-100) which acts as their parent.. so Nagios shows the router (parent) is live while the hosts behind it are down although they up!

would you help me please ?

thanks,

Ammat
 
Old 06-13-2012, 07:12 AM   #8
eblonk
LQ Newbie
 
Registered: Dec 2009
Distribution: Kubuntu, Debian
Posts: 10

Original Poster
Rep: Reputation: 1
A long time after this post but I want to finish it properly. I left that place where I had that problem not long after my last post, so it is out of my hands.
 
Old 09-08-2012, 10:26 AM   #9
MightyCow
LQ Newbie
 
Registered: Sep 2012
Posts: 1

Rep: Reputation: Disabled
I'm bringing this post back up as I'm having the same issue, I found it via Google.

Via TCPDump, I am getting packets (CentOS with Apache and MySQL) from Nagios and responding to them, but Nagios is showing I'm down. I've reset my CentOS, set everything to default in httpd.config and reconfigured it, but have had no luck. Nagios does not respond to my ping requests, but other hosts on my network do. I have a guess that Nagios isn't set to receive my responses or there's an issue with the routing.

Network from outside to in-

Switch --- Vyatta (off) --- NAT --- virtual OS (CentOS and others). Nagios is on a server that is on another Ethernet port on the switch.


Any suggestions?
 
Old 06-27-2013, 03:41 PM   #10
ghandizzle8
LQ Newbie
 
Registered: Jun 2010
Posts: 29

Rep: Reputation: 0
Nagios reports host as down,services as OK

I have the same issue.

I have servers on Amazon and Rackspace. My monitoring server is on Racksapce. The servers on Rackspace are fine but Nagios says that the servers on Amazon are down; even though the services are up. I now realise that Amazon disables ping or maybe disables it from servers outside the Amazon network. Thats the reason why Nagios says my Amazon servers are down.

Hope this helps.

Regards,
Brian
 
  


Reply

Tags
nagios


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
How to change Threshold time in nagios services prak86 Linux - Newbie 5 05-03-2011 02:43 AM
Nagios: Check backupexec services on windows server TalkingMarble Linux - Software 4 11-11-2009 02:14 AM
Establishing a VPN connection (host to host) using IPSec services adithya24 Linux - Networking 9 06-10-2009 08:44 AM
LXer: Tutorial: Keep Tabs on Network Services with Nagios, Pt. 2 LXer Syndicated Linux News 0 06-19-2006 09:54 PM
LXer: Keep Tabs on Network Services with Nagios LXer Syndicated Linux News 0 04-29-2006 04:21 PM


All times are GMT -5. The time now is 06:43 AM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration