Linux - NetworkingThis forum is for any issue related to networks or networking.
Routing, network cards, OSI, etc. Anything is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
If so, i would like to know in case you are monitoring a switch with lets say 50 interfaces (you monitoring interfaces as well) what would happen if you shut down the switch for a certain period of time. Would you get only one notification that the switch is down or you'll get 51 notification, that the switch is down as well as notification that each interface is down?
It depends how you have nagios setup. It could go either way. The way I config it, I would get a "host down" for the switch itself, and then "host unreachable" for all the interfaces. I also would set the switch itself as the parent of the interfaces. You could however identify each interface as an individual host, in which case you'd get 51 down messages, or if you disable the unreachable messages or play with the dependencies.cfg file, you could get just one down for the switch itself.
You'd have to print out at least the hosts.cfg and services.cfg file for me to answer for sure, and probably dependencies.cfg for me to answer with certainty.
Like most linux apps, nagios can do damn near anything, it is all in how you set it up.
It depends how you have nagios setup. It could go either way. The way I config it, I would get a "host down" for the switch itself, and then "host unreachable" for all the interfaces. I also would set the switch itself as the parent of the interfaces. You could however identify each interface as an individual host, in which case you'd get 51 down messages, or if you disable the unreachable messages or play with the dependencies.cfg file, you could get just one down for the switch itself.
You'd have to print out at least the hosts.cfg and services.cfg file for me to answer for sure, and probably dependencies.cfg for me to answer with certainty.
Like most linux apps, nagios can do damn near anything, it is all in how you set it up.
Peace,
JimBass
Thanks for your reply,
I have setup the switch as a host using the management VLAN as an IP, and each interface as a service. e.g:
define host{
use generic-switch ;
host_name Sydswcore01 ;
alias Sydney Core Switch 01 ;
address 10.10.1.253 ;
hostgroups switches ;
}
define service{
use generic-service ;
host_name Sydswcore01
service_description Gi2/1 IP:10.10.1.254 WAN-01 Link Status
check_command check_snmp!-C public -o ifOperStatus.1 -r 1 -m RFC1213-MIB
}
Couldnt find the files you mentioned above. Cant remember if i have seen these files before, my version is probably different. I have files like switch.cfg, windows.cfg, commmands.cfg, etc...
I guess i could test and see what happened but i wanted to avoid getting 50 emails for same device...
Yeah, I compile nagios from source, and those files I mentioned get installed if you go through the full install. I have never worked with a packaged version of nagios. Test it an see. Block the nagios boxes ability to get to the VLAN address of the switch, or simply change it to something else (IE tell nagios the switch lives as 172.16.0.235 instead of 10.10.1.253 (assuming there is no 172.16.0.235)). If you start getting emails, shutdown nagios, change the configs back, and restart.
Nagios is amazing and takes some time to completely learn, but "playing" with it is what will get you there.
I say mess around and get 50 emails as JimBass suggested. In one location we monitor over 500 hosts and over 1000 services (yes the 3D graph for this is amazing :] ). We receive maybe 10-15 notifications a week.
Man, I laughed when you worried about 50 emails. I have 8 nagios installs, each monitoring between 20 and 110+ hosts, with multiple services on most hosts. I average probably 100-150 emails a day, and that's assuming nothing actually goes "hard down". I do have mine set so when a front host goes down (like a firewall or router), we are given "host unreachable" messages for all the hosts behind it.
But yeah, nagios is absolutely great. I use it to keep a lookout on everything, and with the nice GUI interface, my boss can look at a webpage and see exactly where a problem is, and what needs to be fixed. Unquestionably a great piece of work.
Man, I laughed when you worried about 50 emails. I have 8 nagios installs, each monitoring between 20 and 110+ hosts, with multiple services on most hosts. I average probably 100-150 emails a day, and that's assuming nothing actually goes "hard down". I do have mine set so when a front host goes down (like a firewall or router), we are given "host unreachable" messages for all the hosts behind it.
But yeah, nagios is absolutely great. I use it to keep a lookout on everything, and with the nice GUI interface, my boss can look at a webpage and see exactly where a problem is, and what needs to be fixed. Unquestionably a great piece of work.
Peace,
JimBass
I agree, i started using Nagios since few months ago and i have to say i am more and more satisfied. I guess the best way to learn what Nagios can do is to test, i also want to avoid winging from other Sys Admins of how many emails are they getting from Nagios
by the way, how many devices/services is recomended to monitor with one nagios instance? for example, we have 8 sites with total of 1500 users. would one Nagios instance be capable of monitoring all this?
Yeah, one instance of nagios can easily do that. I don't know what you're monitoring with users, so I don't think that should matter, but no matter, nagios can do it easily. You might want to install it at different locations just for IP ability. All of the things I monitor are done over private IPs, so I need one server on each of those LANs. If you're monitoring things that can all be reached over the net, than one machine can easily handle it.
In regards to your test worries, define a new contact group, with yourself as the only member. Then when you fail the switch, only you'll get the notifies, not all the sysadmins.
by the way, how many devices/services is recomended to monitor with one nagios instance? for example, we have 8 sites with total of 1500 users. would one Nagios instance be capable of monitoring all this?
Don't leave out OpenNMS though. We're dropping Nagios for OpenNMS currently where I'm employed. Don't get me wrong though, I like Nagios but OpenNMS is way better, especially when the upper management like pretty graphs. It's like Nagios and Cacti rolled into one application and it's smarter too with autodiscovery.
And OpenNMS also will only notify you of one outage if there's something dependent behind it. Say if a router or switch goes down and you monitor ports on the switch, it won't page you on every single port on the switch that is down, etc.
Don't leave out OpenNMS though. We're dropping Nagios for OpenNMS currently where I'm employed. Don't get me wrong though, I like Nagios but OpenNMS is way better, especially when the upper management like pretty graphs. It's like Nagios and Cacti rolled into one application and it's smarter too with autodiscovery.
And OpenNMS also will only notify you of one outage if there's something dependent behind it. Say if a router or switch goes down and you monitor ports on the switch, it won't page you on every single port on the switch that is down, etc.
Hmmm, i spent far too much time with Nagios and Cacti to change them for another Monitoring software. I will have a look thou...
Anyway, these two does everything we need so far....
In regards to your test worries, define a new contact group, with yourself as the only member. Then when you fail the switch, only you'll get the notifies, not all the sysadmins.
Yeah i can do that.
Another question, what practice are you using on yor network in regards of monitoring devices/services? For example, the more sensitive device i am defining to receive notification the first time the devices is unreachable. You know, by the default is 3 times. I am wondering how other people are setting it up on their networks...
I tend to monitor IP cameras over wireless networks, so I usually let the timeout get to 10 times before a notice is sent. I think you'll drive yourself crazy with a first miss message. Even something working well will miss from time to time.
Take for example 2 computers right next to each other, plugged into a hub. Say you ping the second machine from the first all night long. When you go and check it in the morning, you'll find there were a few missed pings. It might have missed 10 out of 100,000, but it won't be exactly 100%.
I wouldn't lower from 3 failures, but its your system. Set it up as you want, and see if it works. If you get "false downs" as a result of sending at too few failures, then increase the number before you get notified. If it is something like a production server doing e-commerce, than you might well want a notice at the first failure.
I tend to monitor IP cameras over wireless networks, so I usually let the timeout get to 10 times before a notice is sent. I think you'll drive yourself crazy with a first miss message. Even something working well will miss from time to time.
Take for example 2 computers right next to each other, plugged into a hub. Say you ping the second machine from the first all night long. When you go and check it in the morning, you'll find there were a few missed pings. It might have missed 10 out of 100,000, but it won't be exactly 100%.
I wouldn't lower from 3 failures, but its your system. Set it up as you want, and see if it works. If you get "false downs" as a result of sending at too few failures, then increase the number before you get notified. If it is something like a production server doing e-commerce, than you might well want a notice at the first failure.
Peace,
JimBass
Yeah, i think you are right. OK, one last question. I cant get to monitor for example, d or e drive on a windows machine. i have this for c drive:
check_command check_nt!USEDDISKSPACE!-l c -w 70 -c 80
I tried this for d but doesnt work:
check_command check_nt!USEDDISKSPACE!-l d -w 70 -c 80
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.