Linux - SoftwareThis forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
I am running a few self-made plugins using NRPE. It is configured to utilize xinetd to communicate with remote servers. The first few executions of my plugin return the output I expect, with continuous executions returning CHECK_NRPE: Socket timeout after 10 seconds.
I am attempting to use the localhost IP address (127.0.0.1) as the IP for several hosts, each of which would display performance data on Nagiosgraph with rrd files located in the corresponding hosts' rrddir. I can only execute the plugins by passing 127.0.0.1 as the -H argument value (as opposed to using the specific hostname). This is the method of execution which leads to intermittent failures.
are all your server running nrpe also running nagios ?? if not you will have to allow your nrpe to accept at least one server IP that is running nagios.
Thank you for your response. I am running Nagios 3.3.1 and NRPE 2.13 on my main server, which I will refer to as A1. I have modified nrpe.cfg to let A1 recognize itself as an allowed_host. Here is where my plugin differs from a typical case: The plugins I have written use a Perl module to retrieve device statistics from remote servers which do not use NRPE at all. Rather, I use A1 as a hub which locally executes the plugins and thus asks the remote server to send back it's statistics, which are then formatted and returned to Nagios.
Another detail worth noting is that since I am using Nagiosgraph and rrdtool to graphically represent my performance data in the web interface, I needed to create a host object for each device which A1 communicates with even though the -H flag of check_nrpe is always 127.0.0.1 (localhost) for these plugins to use the scripts on A1. I have set the IP addresses for each of these hosts as 127.0.0.1 since the services correspond to one of these hosts and the rrd files are thus written to the appropriate host's rrddir.
As I said before, I am using xinetd with NRPE. I have tried changing the definition of check_nrpe with the inclusion of the -t option in commands.cfg and have also changed the value of command_timeout in nrpe.cfg - each to no avail. Is this because -t and command_timeout are only processed if the NRPE daemon is used?
I have also debugged the plugins and checked the logs to find any clues. As expected, the xinetd entry for the attempted execution of the plugin contains nrpe signal=13 and the duration exceeds ten seconds. However, in successful tries the nrpe status=0 and the durations of successful tries are generally 20 seconds or less. This is what confused me, because my timeout settings are well over a minute but NRPE seems to perceive the threshold as about 20-30 seconds. Here are example log entries:
May 24 14:31:35 A1 xinetd[12471]: START: nrpe pid=28120 from=127.0.0.1
May 24 14:31:35 A1 xinetd[12471]: EXIT: nrpe status=0 pid=27640 duration=20(sec)
May 24 14:31:36 A1 xinetd[12471]: EXIT: nrpe status=0 pid=27662 duration=20(sec)
May 24 14:31:41 A1 xinetd[12471]: EXIT: nrpe status=0 pid=27709 duration=14(sec)
May 24 14:31:42 A1 xinetd[12471]: START: nrpe pid=28373 from=127.0.0.1
May 24 14:31:49 A1 xinetd[12471]: EXIT: nrpe status=0 pid=28373 duration=7(sec)
May 24 14:32:36 A1 xinetd[12471]: EXIT: nrpe signal=13 pid=28120 duration=61(sec)
This has to be an issue with NRPE not accepting my timeout values, right? If I attempt to execute a series of service checks manually, the first few may return valid output and perfdata but some of these checks return the socket timeout - leaving gaps in my graphs and discontinuity in my data. The output of the plugins themselves are as follows:
OK - 135 MHz; |mhz=102;109;95;173;144;127;126;167;148;167;
OK - 5 disk; |disk=2;2;22;2;2;2;2;5;9;3;
OK - 60%; |pct=46;40;74;61;54;54;71;63;71;75;
CHECK_NRPE: Socket timeout after 10 seconds.
Lastly, I want to note that I have set the correct file permissions for nagios.nagios to access them and the directories which are involved in the execution of these plugins. Thank you in advance for any help you can offer.
I figured out the issue here. The change to the timeout variable did take effect, but there was also a change to the nrpe command. In adding the -t 120 flag to the command, it changed the configuration so that in order for NRPE to use the correct timeout value which was set, the -t 120 flag must be in every command run manually.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.