Yep, and yep. We currently have five high end servers (8xCPU, 8xGBs RAM, RAID 0+1x15K) running Nagios. We are also using the NSClient as the majority of our servers are Windows. We can do around 20,000 service level checks per minute at last testing, and we use every bit of it.
But because of the shier volume of checks, statistically we will jump above the three check limit from time to time. And we are on the hook 24/365.
So, we are REALLY hoping to find a way for Nagios to distinguish between "was unable to check" and "was able to check, and there was a failure".
Because sometime a server is just busy. And that is not necessarily something we want to alert on...
|