We're using nagios check_ntp_time for monitoring time on our servers. Unfortunately the service is flapping. And reporting a lot of false-positives. It happens everytime for random server in random day time and lasts for ~10-30 minutes. When the problem occurs we get:
watch01:~ # /usr/lib/nagios/plugins/check_ntp_time -H lb01 -w 1 -c 2 -v
sending request to peer 0
response from peer 0: offset 0.07509887218
sending request to peer 0
response from peer 0: offset 0.07508444786
sending request to peer 0
response from peer 0: offset 0.07499825954
sending request to peer 0
response from peer 0: offset 0.07510817051
discarding peer 0: stratum=0
overall average offset: 0
NTP CRITICAL: Offset unknown|
When everything is ok, we get (I used different server to not have to wait):
watch01:~ # /usr/lib/nagios/plugins/check_ntp_time -H web02 -w 1 -c 2 -v
sending request to peer 0
response from peer 0: offset 0.0002282857895
sending request to peer 0
response from peer 0: offset 0.0002194643021
sending request to peer 0
response from peer 0: offset 0.0002347230911
sending request to peer 0
response from peer 0: offset 0.0002293586731
overall average offset: 0.0002282857895
NTP OK: Offset 0.0002282857895 secs|offset=0.000228s;1.000000;2.000000;
We are using:
- check_ntp_time v1.4.15 (nagios-plugins 1.4.15) on Debian squeeze.
Remote ntp daemon is:
- ntpd - NTP daemon program - Ver. 4.2.4p4
I already found some forums where the problem is described: 1, 2, 3. Every time they edvise to upgrade nagios-plugins, because in version prior to 1.4.13 there was a bug with inserted leap second. But we have already newer version of nagios-plugins.
Maybe still exists in 1.4.15: http://sourceforge.net/tracker/?func=detail&atid=397597&aid=3314686&group_id=29880