One of the external NTP servers (the primary one--currently) we're using as source seems to not be responding to NTP calls. Unfortunately, on our core router (Cisco 6509), the NTP functionality hasn't switched to the secondary NTP external server as it was expected. As a result, our core router which is pretty much our main internal NTP source is 2 minutes late.
I'm planning to fix the external router issue by making the external NTP source be the one currently working. I'm wondering, how much will a 2 minute change affect my users and services? Specially since these days, we're heavily relying on certificate-based authentication.
We're a Windows/Cisco shop.
Internal NTP setup:
[Core Router 1 / Cisco 6509]:
looking out to two external NTP servers (in which the primary one is not responding to NTP calls)
[Core Router 2]:
Synching with Core router 1 (primary), working external router (secondary)
[Other Cisco network devices]:
Synching with Core router 1 (primary), core router 2 (secondary)
[Domain controller(s)]:
Synching with Core router 1
[All windows clients/servers]:
Synching with domain controllers
Unless extremely accurate timekeeping is mission-critical for you there should be no discernible effect for your users, aside from their clocks changing by 2 minutes.
The possible exception is if they declare your NTP server to be "insane" as a result of the large change (which would require you to restart the NTP service on the affected systems to force them to sync the clock - though you can do this without an outage).
While you're fixing this here are a few other pointers:
You should configure your systems that look at external NTP sources to look at several (4-5) servers from the public NTP pool project -- preferably geographically appropriate ones.
Having more NTP servers allows the selection algorithm to ignore ones that break/go insane and keep your clock accurate.
In a configuration like yours I would point
Core Router 1
andCore Router 2
at external clock sources (not each other).This gives you two independently-synchronized clocks which should be within a few ms of each other, but if one of your routers goes insane it can't hurt the other one.
In a configuration like yours I would point the domain controllers at BOTH core routers (again to protect against one going down).
If you want to protect against a clock going insane you should add a third authoritative NTP server (or list one of your routers twice and hope it's not the one that loses its mind…)
Domain defaults for Windows allow the time to be off +/- 300 seconds before authentication stops working, so you'll be fine. Here's a fairly exhaustive article on the subject, which even mentions how to change your tolerance for time skew with a domain-level GPO. It's at
Computer Configuration
->Policies
->Windows Settings
->Security Settings
->Account Policies
->Kerberos Policy
->Maximum tolerance for computer clock synchronization
.That said, you should have your authoritative time source (which is usually the Domain Controller holding the PDC emulator role in a Windows domain) sync with an external
ntp
source, likepool.ntp.org
. More info from Technet, here.And in response to the other answer, this does not require downtime. Just re-point your authoritative time source, and the rest of the domain-joined computers will sync themselves up as well.
EDIT: since @voretaq7 mentioned it, I should point out that we only have one system see an outside time source, our PDC emulator. All devices, including the network gear sync to it. We find this to be a better arrangement, since the networking gear won't reject authentication due to time skew, but domain-joined computers using Kerberos (which is all of them, for us) will. So in that regard, it's not particularly important to have accurate time on our network gear, but it is on our Windows systems, doubly so because we run our time-keeping software for the hourly employees on a Windows server too.
The Windows clients will actually have no problem logging in whatsoever. The description of the
Maximum tolerance for computer clock synchronization
policy is pretty well inaccurate these days.A client with a severely wrong clock will get a response from the server establishing the skew between their clocks - authentication then proceeds normally (with the client adjusting itself to account for the apparent clock skew).
The description is right about one thing; the policy still effectively sets the timer for replay attacks - but, in terms of legitimate traffic, the communication is robust against large clock skews.
See this MS KB article for more information.
You may want to consider looking at other NTP server(s) than your core cisco equipment: serious NTP traffic gives a high cpu load on the cisco equipment which could result in network problems.
Obviously you cannot schedule a small downtime, do you? I would push for a downtime in order to restart the ntp service on all affected servers. If that is not possible, then you have to wait for some time.
(I was going to make this a comment on vortaq7's answer, but i think it deserves repeating in its own right, since many people make this mistake.)
You need at least 3 (preferably 4-6) time sources for NTP's algorithm to accurately converge on the correct time. If NTP has only two primary sources and they're both out by a significant amount, NTP has no way of knowing which one to trust.
The single biggest help to me in understanding this was the diagram on page 9 of the Sun blueprint "Using NTP to Control and Synchronize System Clocks, part III: NTP Monitoring and Troubleshooting". This document disappeared from view when Oracle bought Sun, but you can still find it on the Wayback Machine. There are also plenty of hits around the web if you search for the title.