Currently our web applications contain a logic to check if the data sent to the web server is expired or not by comparing the timestamp of the data with the date/time of the server.
Everything goes will, until some dude from data center accidentally modify one of the web server date/time and causes some disruptions in our web services. My managers are of course not happy with this, and said we shouldn't use timestamp to check expiry in the first place...anyway....
Network Time Protocol is implemented, because of data centers are spread across different continents so we have one NTP server in each data center. The servers within the data center will have cron jobs to check against the time with their NTP server from the same data center. If time is out of sync it will auto update the server date/time.
But then with our managers not happy with it, and think it could still easily causes the same problem. e.g. what if someone accidentally modify the NTP date/time? what if all the NTP servers are out of sync with each other? which NTP servers we can really trust? and blah blah..
So my questions are:
- What are the current practice to sync date/time between servers across multiple data centers or locations?
- How does one manages time stamp between web apps? e.g. Server A send data (contain timestamp of Server A) to Server B (compare timestamp between Server B and the timestamp from the data to see if it has expired or not. This is to avoid HTTP replay)
- Should we really not use timestamp check?
Thanks & Best Regards
This is your first problem. It is most likely caused by a combination of:
Changing system time requires administrative privileges. Changing the time manually on a system that not only has the correct time, but whose time is being managed using NTP is a sign of insufficient training. Solve this problem first, because until you solve it, accurate system time is probably the most visible of your problems. What else are they doing on this system, and why?
If there is a viable alternative option that has been proposed, I'd at least consider it. Somehow I suspect that isn't the case.
I'd recommend two in each data center. And I'd have them each reference a different set of external NTP servers as well as referencing each other. This is going to result in more stable time and make you much more robust to single failures. I'm also paranoid and over-engineer things, so there's that. Still, NTP servers require roughly nil in terms of resources so run them wherever.
This is a bad plan. Cron has no place changing the time in an NTP system. The servers should run real NTP clients. These clients should each reference the (two) local NTP servers.
If you want to use cron, use cron on each server to verify that the server is successfully synchronized with both local NTP servers. You can do this by examining the output of the ntpq command. You should learn about the output of the ntpq command; it is your friend.
To address the questions you report as having been raised:
The first question isn't insane. A bit paranoid if taken to the extreme, but fine. Answers are:
The second is addressed by configuring the NTP servers to reference each other. They will tend to pull together, all other things being equal. Also by using independent trustworthy reference clocks.
It can get complex to describe these cases, but NTP is about stable first, and accurate if it has an accurate source.
As far as trust, most people who run a public NTP server have no reason to interfere with your time. Many of them have a reason to provide accurate time. In terms of level of interest in providing accurate time, I'd suggest:
Also, and this is important: The NTP protocol is designed to synchronize time to within milliseconds. Not seconds. If you use cron + ntpdate, your time can be off by multiple seconds (thank you variable latency!). NTP will keep your clocks much more stable and accurate under similar circumstances.
Properly configured NTP and GMT for all the servers is the best practice. There are GPS master clock servers you can buy, if this is a huge deal, you have the money, and can justify buying one for each data center. This seems like an operations problem -- they should monitor the times on the servers and alert if they get significantly out of wack.