Backstory: I have a couple of internal startum 1 NTP clocks with GPS receivers, and 2 public NTP servers that are virtualized on top of VMware ESXi which take time from the S1 clocks and distribute it. Otherwise this setup works rather fine and provides good time when compared to other public servers.
Problem: When I reboot the virtual machines, they do not start syncing properly, and get stuck in an unsynchronised state. Below is the ntpq -p output after a reboot.
root@server:~$ ntpq -p
remote refid st t when poll reach delay offset jitter
==============================================================================
192.168.1.40 .GPS. 1 u 27 64 3 1.533 -258.43 5948.73
192.168.2.40 .GPS. 1 u 24 64 3 1.118 -258.47 6138.19
192.168.3.42 .GPS. 1 u 24 64 3 0.709 -258.42 5655.02
194.100.49.151 194.100.49.134 2 u 22 64 3 8.124 -258.74 7131.65
gbg1.ntp.se .PPS. 1 u 26 64 3 21.856 -258.43 4876.90
ntp2.sptime.se .PPS. 1 u 23 64 3 19.991 -258.42 7764.97
ntp1.sptime.se .PPS. 1 u 27 64 3 20.489 -258.41 8574.46
If I then run ntp service restart I get this:
root@server:~$ ntpq -p
remote refid st t when poll reach delay offset jitter
==============================================================================
192.168.1.40 .GPS. 1 u 2 64 1 1.517 -258.45 0.065
192.168.2.40 .GPS. 1 u 1 64 1 1.126 -258.46 0.025
192.168.3.42 .GPS. 1 u 2 64 1 0.719 -258.42 0.020
194.100.49.151 194.100.49.134 2 u 5 64 1 8.041 -258.72 0.000
gbg1.ntp.se .PPS. 1 u 6 64 1 21.839 -258.41 0.000
ntp2.sptime.se .PPS. 1 u 4 64 1 19.968 -258.41 0.000
ntp1.sptime.se .PPS. 1 u 3 64 1 20.418 -258.43 0.000
A second later it steps:
root@server:~$ ntpq -p
remote refid st t when poll reach delay offset jitter
==============================================================================
192.168.1.40 .STEP. 16 u 2 64 0 0.000 0.000 0.000
192.168.2.40 .STEP. 16 u 2 64 0 0.000 0.000 0.000
192.168.3.42 .STEP. 16 u 8 64 0 0.000 0.000 0.000
194.100.49.151 194.100.49.134 2 u - 64 1 7.976 -0.261 0.000
gbg1.ntp.se .PPS. 1 u - 64 1 21.840 0.060 0.000
ntp2.sptime.se .STEP. 16 u 6 64 0 0.000 0.000 0.000
ntp1.sptime.se .STEP. 16 u 6 64 0 0.000 0.000 0.000
And after that we're back to normal operation:
root@server:~$ ntpq -p
remote refid st t when poll reach delay offset jitter
==============================================================================
192.168.1.40 .GPS. 1 u 1 64 1 1.474 0.044 0.017
*192.168.2.40 .GPS. 1 u 1 64 1 1.102 0.030 0.005
192.168.3.42 .GPS. 1 u 1 64 1 0.674 0.049 0.009
194.100.49.151 194.100.49.134 2 u 8 64 1 7.976 -0.261 0.000
gbg1.ntp.se .PPS. 1 u 8 64 1 21.840 0.060 0.000
ntp2.sptime.se .PPS. 1 u 6 64 1 19.979 0.059 0.000
ntp1.sptime.se .PPS. 1 u 5 64 1 20.440 0.048 0.000
So it seems that after reboot the system clock is off by quite a bit, which is to be expected, but why ntpd doesn't panic and just steps the clock is a bit hard for me to understand.
Here's my ntp.conf
tinker panic 0
# /etc/ntp.conf, configuration for ntpd; see ntp.conf(5) for help
driftfile /var/lib/ntp/ntp.drift
# Enable this if you want statistics to be logged.
statsdir /var/log/ntpstats/
statistics loopstats peerstats clockstats
filegen loopstats file loopstats type day enable
filegen peerstats file peerstats type day enable
filegen clockstats file clockstats type day enable
# You do need to talk to an NTP server or two (or three).
#server ntp.your-provider.example
# pool.ntp.org maps to about 1000 low-stratum NTP servers. Your server will
# pick a different set every time it starts up. Please consider joining the
# pool: <http://www.pool.ntp.org/join.html>
server 192.168.1.40 iburst
server 192.168.2.40 iburst
server 192.168.3.42 iburst
server time1.mikes.fi
server ntp1.gbg.netnod.se
server ntp2.sptime.se
server ntp1.sptime.se
# Access control configuration; see /usr/share/doc/ntp-doc/html/accopt.html for
# details. The web page <http://support.ntp.org/bin/view/Support/AccessRestrictions>
# might also be helpful.
#
# Note that "restrict" applies to both servers and clients, so a configuration
# that might be intended to block requests from certain clients could also end
# up blocking replies from your own upstream servers.
# By default, exchange time with everybody, but don't allow configuration.
restrict -4 default kod notrap nomodify nopeer noquery
restrict -6 default kod notrap nomodify nopeer noquery
# Local users may interrogate the ntp server more closely.
restrict 127.0.0.1
restrict ::1
# Clients from this (example!) subnet have unlimited access, but only if
# cryptographically authenticated.
#restrict 192.168.123.0 mask 255.255.255.0 notrust
# If you want to provide time to your local subnet, change the next line.
# (Again, the address is an example only.)
#broadcast 192.168.123.255
# If you want to listen to time broadcasts on your local subnet, de-comment the
# next lines. Please do this only if you trust everybody on the network!
#disable auth
#broadcastclient
ntpd default step threshold is 0.125 s, and panic threshold after the first packet is 1000 s. In other words, out of design conditions includes an offset jumping by 15+ minutes.
You captured the initial packet, the step, and eventually peer selection. Takes a minute or two to establish, due to how the NTP algorithms work, even if you use the
iburst
option. Reach of 3 indicates only two packets were received so far. Wait longer, if you are not dropping NTP packets.If the initial offsets or stepping is not acceptable, you can wait until ntpd or the operating system reports synchronized. For systemd on Linux, try depending on
systemd-time-wait-sync.service
.