Just a couple of thoughts.
You've got a bunch of servers, you want to know if they're on time. If I've got it right, you're serving time to them; NTP is running on all those servers, you're the time source and they all should be plus or minus no more than a second or two to you. If there is a server that's not within a few seconds, you want to take a harder look at it and see if NTP is running or not on that box.
So, why not compare your system date to "their" system date (keeping it really simple)?
If you, in a loop, get your system date as the number of seconds since the epoch date with
and execute the same on the target system then compare the two values (basically, subtract one from the other) to see how far out of time the target system may be, then, if it is out too far, execute, say,
on the target which will tell you whether NTP is running on it (NTP synchronizes within milliseconds, so a couple of seconds out of synchronization would be an indication that you need to look further).
You'd want to ping the remotes to see if they're alive (if you're working from a list of addresses), you might want to remote execute ntpq
on a questionable target, maybe a few other things to notify you (maybe with e-mail) that server XYZ has got a problem. You don't care if a system is on-time, you only care about one that is out.
Doing this will go likety-split fast but you'd want to execute the date
command on both systems for every check, do a little shell arithmetic and that's pretty much that. Might take a little experimenting to learn the "best" difference value, system clocks do drift a little but not that much.
Hope this helps some.