LinuxQuestions.org
Support LQ: Use code LQ3 and save $3 on Domain Registration
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices

Reply
 
Search this Thread
Old 02-02-2007, 02:56 PM   #16
MensaWater
Guru
 
Registered: May 2005
Location: Atlanta Georgia USA
Distribution: Redhat (RHEL), CentOS, Fedora, Debian, FreeBSD, HP-UX, Solaris, SCO
Posts: 6,009
Blog Entries: 5

Rep: Reputation: 785Reputation: 785Reputation: 785Reputation: 785Reputation: 785Reputation: 785Reputation: 785

Your first command (the ps) shows nrpe is not running.

Your second command confirms nothing else is using port 5666. (Of course nrpe isn't since it isn't running as shown by the first command.)

You need to try running the start for the nrpe command on your NRPE host.
/usr/local/nagios/libexec/nrpe -c /usr/local/nagios/etc/nrpe.cfg --daemon

Of course substituting the correct path for where I show /usr/local/nagios.

What happens when you run that? Any messages?

After you run that rerun the ps command. Does it show a process other than the grep?

If it does run the lsof -p <pid> ont he process shown by ps.
 
Old 02-02-2007, 04:16 PM   #17
Mangenius
Member
 
Registered: Jan 2007
Posts: 30

Original Poster
Rep: Reputation: 15
Thanks.

Yea, I've tried running it and nothing. Just returns back to a prompt. Same thing happens as before with the ps and lsof commands. Maybe I can try reinstalling.
 
Old 02-03-2007, 10:08 AM   #18
MensaWater
Guru
 
Registered: May 2005
Location: Atlanta Georgia USA
Distribution: Redhat (RHEL), CentOS, Fedora, Debian, FreeBSD, HP-UX, Solaris, SCO
Posts: 6,009
Blog Entries: 5

Rep: Reputation: 785Reputation: 785Reputation: 785Reputation: 785Reputation: 785Reputation: 785Reputation: 785
One thing. Make sure the nrpe.cfg you're editing is the one in your INSTALLDIR/etc. On my system there is a sample nrpe.cfg in another directory but this isn't read on start of nrpe.

Try doing grep nrpe /var/log/messages after you try the start to see what it's putting in your log file.
 
Old 02-06-2007, 03:17 PM   #19
MensaWater
Guru
 
Registered: May 2005
Location: Atlanta Georgia USA
Distribution: Redhat (RHEL), CentOS, Fedora, Debian, FreeBSD, HP-UX, Solaris, SCO
Posts: 6,009
Blog Entries: 5

Rep: Reputation: 785Reputation: 785Reputation: 785Reputation: 785Reputation: 785Reputation: 785Reputation: 785
Did you give up?
 
Old 02-07-2007, 10:28 AM   #20
Mangenius
Member
 
Registered: Jan 2007
Posts: 30

Original Poster
Rep: Reputation: 15
Ah hah!

Well, I'm running the the nrpe.cfg file from the location I specified at the command-line; so, why would the contents of the nrpe.cfg from another location be of any concern?

I found two things:

I logged onto the nrpe host machine and looked at /var/log/messages and found the following after I tried starting the nrpe daemon (today):

Feb 7 No variable value specified in config file '/usr/local/nagios/nrpe.cfg' - Line 192
Feb 7 Config file '/usr/local/nagios/nrpe.cfg' contained errors, aborting...

And I also found an additional copy of NRPE on this machine. Apparently, someone else already installed a copy, and I recall somewhere that the package system for gentoo, emerge, doesn't check for previously installed copies. I don't know if that's correct, but just thought I'd throw that out there. I'm not sure if this additional copy would affect anything but there ya go.

Thanks for the help and, no, I haven't given up yet.
 
Old 02-07-2007, 10:33 AM   #21
Mangenius
Member
 
Registered: Jan 2007
Posts: 30

Original Poster
Rep: Reputation: 15
Feb 7 Starting up daemon
Feb 7 Warning: Daemon is configured to accept command arguments from clients!
Feb 7 Listening for connections on port 5666
Feb 7 Allowing connections from: (the IP of the NRPE machine), (the IP of the Nagios machine)
 
Old 02-07-2007, 10:58 AM   #22
MensaWater
Guru
 
Registered: May 2005
Location: Atlanta Georgia USA
Distribution: Redhat (RHEL), CentOS, Fedora, Debian, FreeBSD, HP-UX, Solaris, SCO
Posts: 6,009
Blog Entries: 5

Rep: Reputation: 785Reputation: 785Reputation: 785Reputation: 785Reputation: 785Reputation: 785Reputation: 785
Looks like you got it running.

I'm assuming ps and lsof now show the nrpe daemon running.
 
Old 02-07-2007, 12:33 PM   #23
Mangenius
Member
 
Registered: Jan 2007
Posts: 30

Original Poster
Rep: Reputation: 15
yes, it's running. Thank you. Unfortunately, I'm still getting an error in nagios that's reporting UNKNOWN and check_ping: %s: Warning threshold must be integer or percentage!

So, I'll have to look into stuff like that. Once I can get that running, my ORIGINAL problem (lol) was bouncing into another machine from an intermediate host. The SSH tunneling may the answer to that, but I'll see if there's some way I can but the nrpe daemon on the interemdiate host and the target host and somehow get the check_nrpe to foward.

thanks again for your help.
 
Old 02-07-2007, 01:59 PM   #24
Mangenius
Member
 
Registered: Jan 2007
Posts: 30

Original Poster
Rep: Reputation: 15
I can't seem to remedy this warning threshold problem. This is what I found when I typed the problem in google:

Quote:
Re: [Nagios-users] check_ping: %s: Warning threshold must be integer or percentage!

OK this is what's happening. Firstly, check_ping on your Nagios machine
is ignored by nrpe - as that contains it's own command definitions. So,
here:
command[check_ping]=/usr/lib/nagios/plugins/check_ping -H $ARG1$ -w
$ARG2$ -c $AGR3$ -p 5

it's expecting the hostname as argument 1, the warning threshold as
argument 2 and the critical threshold as argument 3. (depending on what
your check_nrpe command reads (something like "check_nrpe -H
$HOSTADDRESS$ -p 5666 -c $ARG1$"), what Nagios is actually passing is:

"check_ping!100.0,20%!500.0,60%", so your client machine is reading the
host (argument 1) as 100.0,20%, the warning threshold (argument 2) as
500.0,60% and the critical threshold as blank.

http://www.mail-archive.com/nagios-u.../msg06898.html
 
Old 02-07-2007, 03:48 PM   #25
MensaWater
Guru
 
Registered: May 2005
Location: Atlanta Georgia USA
Distribution: Redhat (RHEL), CentOS, Fedora, Debian, FreeBSD, HP-UX, Solaris, SCO
Posts: 6,009
Blog Entries: 5

Rep: Reputation: 785Reputation: 785Reputation: 785Reputation: 785Reputation: 785Reputation: 785Reputation: 785
I suspect what you read wasn't correct. The:
"check_ping!100.0,20%!500.0,60%"

Looks like the line in services.cfg on the Nagios host. His explanation appears to ME to incorrectly assume that check_ping is just calling the executable in /usr/local/nagios/libexec and passing the parameters after the "!" above directly in. Not impossible but unlikely. What is MORE likely is that the check_ping seen in services.cfg is calling a DEFINED command in checkcommands.cfg. In checkcommands.cfg it likely has syntax similar to:

Code:
# 'check_ping' command definition
define command{
        command_name    check_ping
        command_line    $USER1$/check_ping -H $HOSTADDRESS$ -w $ARG1$ -c $ARG2$
-p 5 -t 60
        }
Based on THAT definiton what is really being passed would depend a lot on what else is defined.

e.g. $HOSTADDRESS$ is built in and simply represents the host_name line in the services.cfg. $ARG1$ and $ARG2$ are built in and represent the two parameters passed (100.0,20% and 500.0,60% - note it is the ! that separates the arguments not the comma).

Even more interesting would be $USER1$ as this actually is defined in resources.cfg here as follows:
Code:
# Sets $USER1$ to be the path to the plugins
# 102402 jda -- $USER1$=/usr/local/nagios/libexec
$USER1$=/usr/lib/nagios/plugins
NOTE: Here /usr/lib/nagios/plugins is just a symbolic link to /usr/local/nagios/libexec. My predecessor did that so don't let it confuse you. It is just telling you WHERE the commands such as check_ping are located.

So based on all that I'm pretty sure he was wrong in his conclusion. One thing he did say that makes sense is to try the command line first so that you're bypassing all configurations. Most of the plugins/commands will give you usage when you type them .

So if you type check_ping it will give you usage and typing it with a -h (not -H) will give you help.

So to test command line just do:
check_ping -H <nrpe hostname> -w 100.0,20% -c 500.0,60%
Then you're doing what I'm saying was in his services.cfg file.

Try the above command substituting your NPRE host in the appropriate place and see what you get. You should see something like:
PING OK - Packet loss = 0%, RTA = 0.22 ms
 
Old 02-08-2007, 11:07 AM   #26
Mangenius
Member
 
Registered: Jan 2007
Posts: 30

Original Poster
Rep: Reputation: 15
Alright. On the nagios host, I used a host that I know was pingable and on the same subnet (you'll see why that's important in a minute)(I should've mentioned in the beginning, too, that I'm running Nagios 1.4.1). So, let's say the Nagios Host is 1.1.1.1, and I want to ping 1.1.1.5, which is on the same subnet. And I also want to ping the NRPE host, which is 2.2.2.2

So, I typed in ./check_ping -H 1.1.1.5 -w 100.0,20% -c 500.0,60%. That returns a PING OK. But when I type in ./check_ping -H 2.2.2.2 -w 100.0,20% -c 500.0,60%, it returns CRITICAL plugin timeout after 10 seconds. I get this in Nagios, as well, except it'll say CRITICAL socket timeout after 10 seconds.

I checked up on this and it seems there is a bug in the C code for the check_ping plugin, according to this site (http://article.gmane.org/gmane.netwo...ns.devel/4431). See, if I'm pinging over the same subnet, the plugin has enough time to process the ICMP request. But if I'm sending it over a different subnet, it takes a little longer to process the request, and there's a way to change it in the C code. The problem is I don't know where to get a check_ping.C file or check_ping.cpp file. I don't think I have any of those on my system. I have to get the source code for the plugins and recompile the check_ping plugin. That's my theory. But I wonder if there's something that I've missed.

Update: Actually, there's a -t option for the check_ping command, where I can specify more the timeout, in seconds (default is 10). But the plugin keeps timing out; first i put it 20, then 60. So, I'm guessing the problem is still in the code. But the problem I tried recompiling the .c file after the change (g++ check_ping.c) and it gave me a bunch of errors. I just tried 300 seconds and it worked; I got packet loss = 0%, RTA=5.80ms, but the state is CRITICAL, because of the RTA. There is something funky with the NRPE host machine, but at least I know I can ping to it and it does return a response at some point. I did this on the Nagios host machine. So, I guess I don't have to bother recompiling and going through all that trouble.

The one thing that really confuses me is the syntax for passing into the check_nrpe command. In the services.cfg file, I type check_nrpe!check_ping!100.0,20%!500.0,60% In the nrpe.cfg, on the NRPE host machine, I have

command[check_ping]=/usr/local/nagios/libexec/check_ping -w $ARG1$ -c $AGR2$ -p 5

I keep getting this warning problem with the integer and percentage bs. Is there a prime example for doing check_ping using nrpe, because I really wonder if everything is defined the way it should be.

Last edited by Mangenius; 02-08-2007 at 11:50 AM.
 
Old 02-08-2007, 12:19 PM   #27
Mangenius
Member
 
Registered: Jan 2007
Posts: 30

Original Poster
Rep: Reputation: 15
The next thing I tried was the following in the nagios host machine:

./check_nrpe -H (NRPE host IP) -c check_ping -a 100.0,20% 500.0,60% -t 60
check_ping: %s: Warning threshold must be integer or percentage!

I'm still getting this warning threshold thing by passing the arguments into the command directly. Which means, at least to me, the problem is on the NRPE host, not the Nagios host. But what exactly do I have to type in the nrpe.cfg file for this to work?

Thank you very much.
 
Old 02-08-2007, 01:00 PM   #28
MensaWater
Guru
 
Registered: May 2005
Location: Atlanta Georgia USA
Distribution: Redhat (RHEL), CentOS, Fedora, Debian, FreeBSD, HP-UX, Solaris, SCO
Posts: 6,009
Blog Entries: 5

Rep: Reputation: 785Reputation: 785Reputation: 785Reputation: 785Reputation: 785Reputation: 785Reputation: 785
check_ping is designed to respond back to Nagios not to NRPE. Remember NRPE is a plugin (i.e. an added thing) to Nagios so is not a full Nagios is not exactly the same thing as Nagios.

You likely could make first NRPE host run a check_ping to the second NRPE host by putting what you need in a script and insuring the output of the script is done the way expected by NRPE. You would then put the script within your first host's npre.cfg.

Once that is done from the standpoint of the Nagios host the service being monitored would be for the first NRPE host rather than the second. It would be your services.cfg file where you'd make it distinguish the two based on what you told it.

An example of a simple script we use on one of our NRPE hosts to check its Netbackup (Veritas Backup Software) daemons is:

Code:
#!/bin/ksh
#
# This script checks the state of the Netbackup daemons and passes the status to EMA
#
CRITICAL_STATE=2
WARNING_STATE=1
OK_STATE=0
cnt=0

while true
do

cnt=`/usr/bin/ps -aef | /usr/bin/grep -v grep | grep -v $0 | /usr/bin/grep -c $1`
if [ $cnt -eq 0 ]
then
echo "CRITICAL: $1 is DOWN!"
exit $CRITICAL_STATE
fi

echo "OK: $1 Daemon is up"
exit $OK_STATE

done
The entries in the nrpe.cfg file that call this script:
Code:
command[check_bprd]=/usr/local/nagios/libexec/check_netbackup.sh bprd
command[check_bpdbm]=/usr/local/nagios/libexec/check_netbackup.sh bpdbm
command[check_tldd]=/usr/local/nagios/libexec/check_netbackup.sh tldd
command[check_tldcd]=/usr/local/nagios/libexec/check_netbackup.sh tldcd
command[check_vmd]=/usr/local/nagios/libexec/check_netbackup.sh vmd
command[check_ltid]=/usr/local/nagios/libexec/check_netbackup.sh ltid
Note that we check multiple daemons. Notice also the command name is defined as "check_bprd" for the first daemon, "check_tldd" for the second etc...

Now on the Nagios server in services.cfg we have entries (I'm only givine the ones for 2 of the above daemons here):
Code:
define service{
        use                     generic-service
        host_name               ATUBKS01
        service_description     Netbackup Request Daemon
        contact_groups          ux-admins, noc-op
        check_command           check_nrpe!check_bprd
        }

define service{
        use                     generic-service
        host_name               ATUBKS01
        service_description     Netbackup Database Daemon
        contact_groups          ux-admins, lchen, noc-op
        check_command           check_nrpe!check_bpdbm
        }
The key to the scripts you create is that they should only report one line text. Nagios does NOT understand multiple lines of output. Also you must use the exit statuses understood by Nagios.
Those are the first values you see defined in the script above. You can't use different values (e.g. 2 always means Critical). You don't have to define them in the script but you do have to use them at the exit point. The default exit if you don't put one in is 0 for success or 1 for failure. (That's basic scripting). You can't rely on those defaults for your Nagios/NRPE scripts because:

A) They leave out the 2 you need for critical

B) By default you get the status of the last command issued within the script. So if you had a script that did something like:
mount this
mount that
su -
ls
And read it as a non-root user it would error on the mounts and the su - but the ls would work so it would exit with a status 0 even though 75% of the commands had failed. Putting in the exit command with a value of 2 at the failure however would insure you gave it the correct status value understood as critical and would also make it not waste time executing the other commands.

The above is a very simple script. I've gotten some more detailed ones if you get that far. Also you can do this with perl or whatever is your choice. As noted it is the single line of text AND the exit value that are important - not what you used to produced them.
 
Old 02-08-2007, 04:00 PM   #29
Mangenius
Member
 
Registered: Jan 2007
Posts: 30

Original Poster
Rep: Reputation: 15
THANK YOU VERY MUCH for that long post on the scripting solution. It is most APPRECIATED!!!! I didn't implement it yet, but I will surely do so when I get to the point NRPE is up and running.

Unfortunately, as of right now, nagios is reporting check_ping: Could not parse arguments, strictly for check_nrpe.

On the nagios host, I type this at the command prompt:
./check_nrpe -H (IP of NRPE host) -c check_ping -a 100.0,20% 500.0,60% -t 300

and it returns this:
check_ping: Could not parse arguments

Executing it from the command prompt is just manually passing in the arguments. But it can't parse the arguments.

Here's what I have in checkcommands.cfg:
# 'check_ping' command definition
define command{
command_name check_ping
command_line $USER1$/check_ping -H $HOSTADDRESS$ -w $ARG1$ -c $ARG2$ -p 5
}

# 'check_nrpe' command definition
define command{
command_name check_nrpe
command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$ -a $ARG2$ $ARG3$
}


Here's what I have in services.cfg:
check_command check_nrpe!check_ping!100.0,20%!500.0,60%


And here's what I have in nrpe.cfg:
command[check_ping]=/usr/local/nagios/libexec/check_ping -H $HOSTADDRESS$ -w $ARG1$ -c $ARG2$ -p 5


And yet I'm still getting this 'could not parse arguments' business. I must be doing something wrong, but I have yet to figure out what. I can't believe how much frustration it is (for me) to setup this stupid nrpe plugin!

Update: Now it says check_ping: %s: Warning threshold must be integer or percentage! instead. I added "-t 300" to the end of the check_nrpe command in checkcommands.cfg.

Last edited by Mangenius; 02-08-2007 at 04:33 PM.
 
Old 02-09-2007, 08:41 AM   #30
MensaWater
Guru
 
Registered: May 2005
Location: Atlanta Georgia USA
Distribution: Redhat (RHEL), CentOS, Fedora, Debian, FreeBSD, HP-UX, Solaris, SCO
Posts: 6,009
Blog Entries: 5

Rep: Reputation: 785Reputation: 785Reputation: 785Reputation: 785Reputation: 785Reputation: 785Reputation: 785
First: Do you actually have "check_ping" installed on the NRPE host as well as the Nagios host? Normally it isn't because it's not part of the NRPE plugin.

Second: You don't need to add check_ping in the nrpe.cfg on the first NRPE host to do a check_ping of the first NRPE host. In fact you don't even need NRPE installed for check_ping to work - it does all the work from the Nagios host side. The only reason you would add check_ping on the first NRPE host is to run the check_ping to the second NRPE host from the first NRPE host.

Third: nrpe.cfg does not use the same variable definitions as services.cfg. Remember it is a plugin (i.e. not part of the base Nagios setup). You would have to hard code the threshholds rather than trying to pass them in as variables.

Fourth: Just to reiterate an earlier point - make sure you're doing all nrpe.cfg changes on the NRPE hosts and not the Nagios host.

You need to do this in an ordered fashion. First make sure you can do a simple ping of the NRPE host you're interested in from the Nagios host.

Earlier you said:
Quote:
So, I typed in ./check_ping -H 1.1.1.5 -w 100.0,20% -c 500.0,60%. That returns a PING OK. But when I type in ./check_ping -H 2.2.2.2 -w 100.0,20% -c 500.0,60%, it returns CRITICAL plugin timeout after 10 seconds. I get this in Nagios, as well, except it'll say CRITICAL socket timeout after 10 seconds.
I'm assuming you did that on the Nagios host. On that Nagios host what happens when you just try "ping 2.2.2.2"? If a simple ping doesn't work then a check_ping won't either. It would indicate you can't do ICMP to 2.2.2.2.
 
  


Reply

Tags
monitoring, nagios, nrpe


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
squid proxy server configuration & distribution of internet without proxy gaurav_gupta082 Linux From Scratch 2 07-31-2010 11:25 AM
Nagios - nrpe plugin configuration nitin-saxena Linux - Software 1 10-27-2006 01:50 PM
Proxy configuration tsaravan Linux - Newbie 1 08-04-2005 06:28 AM
Nagios NRPE twantrd Linux - Software 1 10-20-2004 08:24 AM
Proxy configuration milon Linux - Newbie 1 09-26-2004 09:24 AM


All times are GMT -5. The time now is 07:33 PM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration