Nagios "No Route to Host" on CentOS

kapshure · 11-23-2010, 05:35 PM

I've got a Nagios server (on CentOS 5), and a monitored node (also on CentOS 5). I initially had a problem with SSH key-exchange, but that has been solved, and I'm still receiving a No Route to Host.

Nagios server: 10.0.100.130
monitored node: 10.0.100.143

Yet, I can do the following from Nagios Server:

Code:

/usr/local/nagios/libexec/check_tcp -H 10.0.100.143 -p 5666
TCP OK - 0.000 second response time on port 5666|time=0.000361s;0.000000;0.000000;0.000000;10.000000

also can do this from the Nagios Server:

Code:

ssh 10.0.100.143 /usr/local/nagios/libexec/check_procs 
PROCS OK: 603 processes

I can successfully ping 10.0.100.143 from Nagios server as well.

grep for the monitored node in /var/log/messages pulls this up:

Code:

Nov 10 00:00:00 nagiosbox nagios: CURRENT HOST STATE: monitorednode;UP;HARD;1;PING OK - Packet loss = 0%, RTA = 0.21 ms 

Nov 10 00:00:00 nagiosbox nagios: CURRENT SERVICE STATE: monitorednode;Home Page;CRITICAL;HARD;1;No route to host

route and ifconfig info

Code:

from monitored node:

ping 10.0.100.130
PING 10.0.100.130 (10.0.100.130) 56(84) bytes of data.
64 bytes from 10.0.100.130: icmp_seq=1 ttl=64 time=0.897 ms

monitored node ifconfig:

ifconfig
eth0      Link encap:Ethernet  HWaddr 00:1D:09:2C:C3:2A  
          inet addr:10.0.100.143  Bcast:10.0.100.255  Mask:255.255.255.0
          inet6 addr: fe80::21d:9ff:fe2c:c32a/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:151840310 errors:0 dropped:0 overruns:0 frame:0
          TX packets:20026487 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:145578488128 (135.5 GiB)  TX bytes:2364444581 (2.2 GiB)
          Interrupt:169 Memory:f8000000-f8012800 

eth0:1    Link encap:Ethernet  HWaddr 00:1D:09:2C:C3:2A  
          inet addr:10.0.100.144  Bcast:10.0.100.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          Interrupt:169 Memory:f8000000-f8012800 

"route" from monitored node:

 route
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
10.0.100.0      *               255.255.255.0   U     0      0        0 eth0
169.254.0.0     *               255.255.0.0     U     0      0        0 eth0
default         10.0.100.1      0.0.0.0         UG    0      0        0 eth0



from Nagios box, ifconfig:

/sbin/ifconfig
eth0      Link encap:Ethernet  HWaddr 00:1C:23:C8:96:AE  
          inet addr:10.0.100.130  Bcast:10.0.100.255  Mask:255.255.255.0
          inet6 addr: fe80::21c:23ff:fec8:96ae/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:1968825668 errors:0 dropped:0 overruns:0 frame:0
          TX packets:2112609296 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:708043528943 (659.4 GiB)  TX bytes:995965269105 (927.5 GiB)
          Interrupt:169 Memory:f8000000-f8011100 

"route" from nagios box:

 /sbin/route
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
10.0.101.0      *               255.255.255.0   U     0      0        0 eth1
10.0.100.0      *               255.255.255.0   U     0      0        0 eth0
169.254.0.0     *               255.255.0.0     U     0      0        0 eth0
default         10.0.100.1      0.0.0.0         UG    0      0        0 eth0

i have a bucket container:

Code:

/usr/local/nagios/etc/servers/monitorednode.cfg:


define host{
      use linux-server ; Inherit default values from a template
        host_name monitorednode ; The name we're giving to this server
        alias monitorednode ; A longer name for the server
        address 10.0.100.143 ; IP address of the server
}
define service{
        use generic-service
        host_name                       monitorednode
        service_description             Home Page
        check_command                   check_http!ww2

If I do a ./check_http -H 10.0.100.143, I get a connection refused, Unable to open TCP socket. I can't telnet to 80 on that box either.

If I do a ./check_http -H 10.0.100.144, I get:

Code:

OK - HTTP/1.1 301 Moved Permanently - 0.003 second response time |time=0.002535s;;;0.000000 size=434B;;;0

I can telnet successfully to 80 on .144

Someone mentioned that this error isn't Nagios, but with the OS. specifically stating that the "Home Page" check isn't looking at a valid host name or address vs the check_ping plugin. Problem is... I can't find any reference to "Home Page" anywhere.

I got these from /usr/local/nagios/etc/objects/commands.cfg

Code:

'check-host-alive' command definition
define command{
        command_name    check-host-alive
        command_line    $USER1$/check_ping -H $HOSTADDRESS$ -w 3000.0,80% -c 5000.0,100% -p 5

'check_ping' command definition
define command{
        command_name    check_ping
        command_line    $USER1$/check_ping -H $HOSTADDRESS$ -w $ARG1$ -c $ARG2$ -p 5

# 'check_http' command definition
define command{
        command_name    check_http
        command_line    $USER1$/check_http -I $HOSTADDRESS$ $ARG1$
        }

Under /etc/rc.d/init.d/nagios I can see that I've got the paths right:

Code:

prefix="/usr/local/nagios"
exec_prefix="/usr/local/nagios"
exec="/usr/local/nagios/bin/nagios"
config="/usr/local/nagios/etc/nagios.cfg"

Code:

Nov  9 00:00:00 nagiosbox nagios: CURRENT SERVICE STATE: monitorednode;Home Page;CRITICAL;HARD;1;No route to host 
Nov 10 00:00:00 nagiosbox nagios: CURRENT HOST STATE: monitorednode;UP;HARD;1;PING OK - Packet loss = 0%, RTA = 0.21 ms 
Nov 10 00:00:00 nagiosbox nagios: CURRENT SERVICE STATE: monitorednode;Home Page;CRITICAL;HARD;1;No route to host

Code:

define host{
	use linux-server ; Inherit default values from a template
        host_name monitorednode ; The name we're giving to this server
        alias monitorednode ; A longer name for the server
        address 10.0.100.143 ; IP address of the server
}
define service{
        use generic-service
        host_name                       monitorednode
        service_description             Home Page
        check_command                   check_http!ww2
}

kapshure · 11-23-2010, 05:47 PM

I went back and checked /var/log/messages on the monitored node and got this:

nrpe[17759]: Error: Could not complete SSL handshake. 5

but, I am able to do:

from the monitored node
ssh nagios@nagiosserver

and

from the nagios server
ssh nagios@monitorednode

and complete login successfully b/c SSH Key exchange is working. so why the SSL handshake error?

EDIT:

OK, I believe I have found a possible lead on this.

I changed the monitorednode.cfg to this:

define service{
use generic-service
host_name sacdcweb03
service_description HTTP
check_command check_http
}

took out the "Home Page" and the "check_http!ww2" , where it originally read this in the monitorednode.cfg:

service_description Home Page
check_command check_http!ww2

so what I get now in /var/log/messages is:

nagios: CURRENT SERVICE STATE: sacdcweb03;HTTP;CRITICAL;HARD;3;Connection refused

so now connection refused troubleshooting talks about checking version differences on the Nagios server, and the monitored node where NRPE daemon is running.. sooo.. I found that the monitored node has 2.12, and the Nagios server has 2.8

I ran a "make clean" and a "make uninstall" in the original directory on the monitored node, but I can still execute check_nrpe plugin and see V 2.12 status returned.

How do I correctly remove v2.12 NRPE from the monitored node? I'm suspecting that re-installing the NRPE daemon with 2.8 will possibly clean this up! so do I just go and manually rip out everything that the current 2.12 install has placed.

quanta · 11-23-2010, 08:48 PM

Quote:

Originally Posted by kapshure

I went back and checked /var/log/messages on the monitored node and got this:

nrpe[17759]: Error: Could not complete SSL handshake. 5

Add the Nagios IP to the end of line 'allowed_hosts' in /path/to/nrpe.cfg.

kapshure · 11-23-2010, 11:50 PM

that was already there:

Code:

allowed_hosts=127.0.0.1 10.0.100.130

there shouldn't be a comma b/t them, as I did test this, and got an error for nagios on it.

quanta · 11-24-2010, 03:07 AM

Put a comma between them but don't put a space after comma. Something like this:

Code:

allowed_hosts=127.0.0.1,192.168.7.63,192.168.7.127