Recently I've encountered 2 instances which are in a VPC that have time drifted issue. What I've noticed is that the time servers does not have * and + prefix compared to another instance with accurate time in the same autoscaling group.
“+” – Good and a preferred remote peer or server (included by the combine algorithm)
“*” – The remote peer or server presently used as the primary reference
EC2 instance where time has drifted
$ ntpq -p
remote refid st t when poll reach delay offset jitter
==============================================================================
y.ns.gin.ntt.ne .STEP. 16 u - 1024 0 0.000 0.000 0.000
ns1.unico.com.a .STEP. 16 u - 1024 0 0.000 0.000 0.000
saul.foodworks. .STEP. 16 u - 1024 0 0.000 0.000 0.000
b.pool.ntp.uq.e .STEP. 16 u - 1024 0 0.000 0.000 0.000
internalntpserver1. 10.68.10.1 8 u 815 1024 377 0.862 -477696 2391.53
internalntpserver2. 10.68.2.226 7 u 213 1024 377 1.755 -477012 1861.00
EC2 instance where time is correct
# ntpq -p
remote refid st t when poll reach delay offset jitter
==============================================================================
0.time.itoc.com .STEP. 16 u - 1024 0 0.000 0.000 0.000
a.pool.ntp.uq.e .STEP. 16 u - 1024 0 0.000 0.000 0.000
node01.au.verbn .STEP. 16 u - 1024 0 0.000 0.000 0.000
node02.au.verbn .STEP. 16 u - 1024 0 0.000 0.000 0.000
+internalntpserver1. 10.68.10.1 8 u 680 1024 377 1.551 -260.56 77.778
*internalntpserver2. 10.68.2.226 7 u 719 1024 377 0.631 -114.34 334.611
Restarting ntpd daemon fixed that but I can't find anything online as to what could have caused this behaviour.
Any help would be very much appreciated.
Thank you.
In the first example, the jitter is very high, as is the offset. With jitter measured in seconds, NTP will probably just decide that both reference servers are insane and will refuse to sync.
Your other problem is that the rule for NTP reference servers is "one or four". A man with two clocks is never sure which clock is wrong, a man with three clocks can exclude one of them that doesn't agree with the other two. But you should have four, just in case one of them is not reachable.
The reachability of the other reference servers is also a big problem, you need to figure out what firewall is blocking access to NTP packets going to those servers.