Linux - SoftwareThis forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Sorry about the phrasing of the subject. Here's what I'd like to do:
We have a nagios server -- let's call it 18.104.22.168 -- and we want to run nrpe as a daemon on a remote system -- let's call it new_client, with an ip of 22.214.171.124. The problem is we can't see 126.96.36.199 from 188.8.131.52. But 184.108.40.206 can! So what we want to do is do a check_nrpe from 220.127.116.11 (the nagios server) to 18.104.22.168, which in turn would execute check_nrpe to 22.214.171.124. Is there are a way I can do this without installing two nagios servers?
Can I just install nagios plugins and nrpe on all three machines? do I need to make any special changes in the nrpe.cfg file if I'm running nrpe as a daemon? In other words, I don't need to specify the nagios IP and remote host IPs in the nrpe.cfg file.
Update: For now, I just want the nagios server to be able to send a request to the nrpe daemon and process the request on that remote host. the problem right now is nrpe is not starting up. I don't see it running on the remote host.
How about some SSH tunneling? You can set up a tunnel from your Nagios host to the intermediate one and then on that allows you to connect from the Nagios host via intermediary to the target host, http://souptonuts.sourceforge.net/sshtips.htm for details.
Haven't done it but would suggest what UnSpawn said.
As to nrpe.cfg - the only thing it needs so far as the Nagios server is concerned is an allow hosts line. This should be the IP that the Nagios server is coming in through which may appear to be different using tunneling than the real host. That line is just:
nrpe.cfg is mainly used to tell the NRPE on the host being monitored what to monitor and what port to use.
Interesting suggestions and advice. Thank you. I'll have to look into ssh tunneling; that's a new one to me. but I wonder if I can do something as simple as passing check_nrpe into check_nrpe, as well as the command I want to execute on the target remote host, like this:
Whenever nagios sees check_nrpe, it passes along the arguments to be executed by the remote host. Myabe I'd have to create a services and host configuration files on the intermediate host or something, I dunno. I'd have to somehow tell the intermediate host to pass it to the target remote host. That ssh tunneling may be the trick to all this.
But first, I have to get it running from the server to any host. So far, the nrpe daemon is not running.
This is very aggrivating. I can't get nrpe to work, at all! Here's what I have:
Nagios server(let's call it 126.96.36.199):
Here, I'm just concerned about services.cfg, hosts.cfg, checkcommands.cfg and nrpe.cfg.
host nrpe_client was stored in the hosts.cfg file, and nagios recognizes it with the check-host-alive command, which is what I told to do in the hosts.cfg file. Right.
In the services.cfg file, I typed in for the command, "check_nrpe!check_ping!100.0,20%!500.0,60%" (without quotes).
In checkcommands.cfg, I have the following: command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$ -a $ARG2$ $ARG3$ $ARG4$
I also have the following for check_ping: command_line $USER1$/check_ping -H $HOSTADDRESS$ -w $ARG1$ -c $ARG2$ -p 5
In the nrpe.cfg file, server_address=188.8.131.52 and allowed_hosts=127.0.0.1, 184.108.40.206
Here, I'm just concerned about nrpe.cfg
In nrpe.cfg, server_address=220.127.116.11, and allowed_hosts=127.0.0.1, 18.104.22.168
command[check_ping]=/usr/local/nagios/libexec/check_ping -H $ARG1$ -w $ARG2$ -c $AGR3$ -p 5
I type in ./nrpe -c nrpe.cfg -d, pres enter, and nothing. Nagios says connection refused by host. I'm not sure what I'm doing wrong here.
From what you wrote it sounds almost as if you're configuring npre.cfg on the Nagios host rather than the NRPE host.
Sort of. Actually, I was configuring it on both the Nagios host and the NRPE host.
This is important to understand for troubleshooting: How does the nagios host communicate with the NRPE host?
If I wanted to ping any host, it'll execute the check_ping command on the Nagios host. It knows which host to ping, because the IP of that host is defined in the services.cfg file. But what about the NRPE host? The check_nrpe command executes on the NRPE host, not the Nagios host, right? So how is the Nagios host supposed to know the NRPE host executes the command?
The way I understand it, Nagios host sees check_nrpe and sends the command to the NRPE host, not unlike sending any checkcommand to any host. The difference is the NRPE daemon catches the command, strips check_nrpe and passes on the arguments after it (i.e. check_http) Then, the NRPE host executes those arguments. I think that sounds right, but that's why I'm asking, because I can't troubleshoot anything if I don't understand how it works.
So, why is it that I read for the NRPE installation you have to install it on both the NRPE host and the Nagios host?
<host_address> = The IP address of the host running the NRPE daemon
[port] = The port on which the daemon is running - default is 5666
[command] = The name of the command that the remote daemon should run
[to_sec] = Number of seconds before connection attempt times out.
Default timeout is 10 seconds
This plugin requires that you have the NRPE daemon running on the remote host.
You must also have configured the daemon to associate a specific plugin command
with the [command] option you are specifying here. Upon receipt of the
[command] argument, the NRPE daemon will run the appropriate plugin command and
send the plugin output and return code back to *this* plugin. This allows you
to execute plugins on remote hosts and 'fake' the results to make Nagios think
the plugin is being run locally.
The config file that would be relevant on the Nagios host would be checkcommands.cfg. This is just so you can use shorthand in the services.cfg.
So in my checkcommands.cfg file I have (among other things):
In services.cfg I use the above defined command as for example:
service_description # CPU Utilization
contact_groups ux-admins, noc-op
The above check command translates to:
/usr/local/nagios/libexec/check_nrpe -H <each host in host group 11> -c check_cpu -to 120.
The -to 120 tells it to timeout in 2 minutes.
Note in the usage it talks about specifying port which you could do but the default is 5666 and needs to be in the nrpe.cfg on the remote host.
The hosts are defined in hosts.cfg and hostgroups.cfg. You can define a service by either. Typically we put all our hosts in host groups and do the common things (cpu checks, memory checks etc...) at the host group level then do specific things (e.g. filesystem checks, web server checks) per host as they aren't the same on all hosts.
Thank you very much for the detailed explanation. I sincerely appreciate it. You should've wrote the faq for it lol
I followed it like you said. Now I'm getting "(Return code of 127 is out of bounds - plugin may be missing)" (without quotes), which is better than "connection refused by host". It's also listed as CRITICAL.
Here's the relevent information I have on nagios host
# 'check_nrpe' command definition
command_line check_nrpe -H $HOSTADDRESS$ -c $ARG1$ -a $ARG2$ $ARG3$ $ARG4$
Here's the relevent information i have on NRPE host
allowed_hosts=127.0.0.1, (IP of nagios host)
command[check_ping]=/usr/local/nagios/libexec/check_ping -H $HOSTADDRESS$ -w $ARG1$ -c $AGR2$ -p 5
I still can't see the nrpe daemon running; so, I must've made enough changes to have nagios report this new error of 127 out of bounds. I should note that the Nagios host does have check_nrpe in its libexec for the plugins; however, the NRPE host does NOT have check_nrpe in its libexec. Could this be the problem?
127.0.0.1 is not the IP of any host but rather the IP of "localhost" (a/k/a loopback). It always refers to the system you are on. That is to say on your Nagios host 127.0.0.1 refers to the Nagios host but on the NRPE host 127.0.0.1 refers to the NRPE host rather than the Nagios host. On a Windoze workstation 127.0.0.1 would refer to the Windoze workstation. This is not a Nagios/NRPE thing but a basic networking concept.
Since the nrpe.cfg shows only 127.0.0.1 allowed you're basically telling the NRPE host it can only talk to itself.
Run "ifconfig" on your Nagios host. Assuming it is linux you'll see an entry for lo0 (loopback) but should also see another entry for your NIC. It is the Nagios host NIC's IP that should be in the nrpe.cfg on the NRPE host. The allow is saying "allow the specified Nagios host to talk to me (me being the NRPE host).
As noted in my prior post check_nrpe is a command and should be in libexec on the Nagios host. nrpe.cfg is a configuration file and should be on the NRPE host. So the answer to your final question is no - you have it right so far as where the command is.
Last edited by MensaWater; 02-02-2007 at 12:59 PM.
No, no, you misunderstood. Under allowed_hosts, I had 127.0.0.1 and the IP of the nagios host. Yea, I know that 127.0.0.1 is a loopback to the localhost; I did that intentionally, instead of writing the NRPE host IP. For some reason, the nrpe daemon is not listening on the nrpe host, which is my current reason why it's not working.
I checked using netstat and chkconfig and it's not there, but it's listed under etc/services as nagios-nrpe with port 5666. It's very perplexing.
/etc/services simply associates the port with the service. The main point of this is to prevent things that get random ports from taking 5666. (It doesn't actually reserve it - only prevents random assignment so that something that explicitly asks for it can still get it.) It doesn't actually RUN the daemon itself.
You have to start the command "nrpe" on the NRPE host. This is what becomes the daemon. Until you see it running and listening on port 5666 (or whatever port you assign it) then Nagios's check_nrpe won't be able to talk to it.
The start command for nrpe on my NRPE Linux host is:
/usr/local/nagios/libexec/nrpe -c /usr/local/nagios/etc/nrpe.cfg --daemon
There should be an startup script for this in your rc setup. Mine is /etc/init.d/nrpe on a RHEL AS4 system.
If "ps -ef |grep nrpe" doesn't show it running then it isn't a daemon so isn't listening.
By the way "lsof -i :5666" is a quick way to see if anything is listening on port 5666. It will show you the process and its PID as well.
Thank you very much for all your help. I'm still learning this world and your advice is helping me a lot. Yea, it's not listening. There might be a restriction on a firewall somewhere (I'm not that familar with the topology where I'm at). Is there a log file somewhere, where I can look up errors or activity by the system and/or the nrpe program itself? This way, maybe I can get some insight, as to why the process is not starting or bailing out.
The firewall should only affect what things outside the box see. If you're on the NRPE host itself you should be able to run the "ps -ef |grep nrpe" to see if it is running. If it is then run "lsof -p <pid>" on the Process ID. You should see a line like:
nrpe 3938 root 3u IPv4 221358 TCP *:nrpe (LISTEN)
This would show you the process is listening on TCP port named "nrpe" which would be defined in /etc/services. If you see something other than nrpe then it likely means the the port isn't the one you think it is OR /etc/services already had a separate definition. If you see a name it will be in /etc/services (or NIS services if you're doing NIS). If you see a number then it isn't defined in /etc/services. That isn't a major problem so long as it is running. As noted above /etc/services just associates the name with the port number and either tcp or udp.
If the above didn't help let me know - On the NRPE host what do the following show?:
ps -ef |grep nrpe
lsof -i :5666
Well, the way it works in our topology, some systems can't see each other; they're blocked on purpose. Now, the nagios host and this nrpe host are capable of seeing one another. the only reason I brought this up is I wonder if there's a setting in a firewall someplace that is restricting traffic in such a way that it's affecting nrpe, just on a whim.
This is what it shows for ps -ef | grep nrpe:
root 13805 5043 0 15:10 pts/0 00:00:00 grep nrpe
And lsof -i :5666 just returns back to a prompt. In other words, there's no output.
I followed one of the faqs (http://www.nagios.org/docs/ -- it's the word file for installing nagios and nrpe) and executed the below on the nrpe host: