This looks like the same issue as Windows Server 2022 Time Service Jumping into the future. I've also added a support ticket at Microsoft (Feedback Hub) for the issue: https://aka.ms/AAkwnpl
As the system clock is essential for correctly working software and probably the most central shared mutable state, this issue is wreaking havoc on both our systems and everyone we communicate with, causing ripples all the way to critical infrastructure.
We noticed this for the first time in august 2022 on a 2019 server. The clock was set to January 2023, but corrected itself. Unfortunately, this was found some time after logs had been purged, so we were unable to debug it further.
But last month, we experienced it again, this time on a 2016 server. The clock was set to 55 days to the future.
15 seconds later, Time-Service noticed the clock was different than our domain controller, and that it must change the clock back -4454176 seconds. It backs off as it things it's larger than 4294967295.
15 minutes after the first change, the clock is set again, this time backwards to 12h26m43s in the future.
15 seconds after the second change, Time-Service notice the clock is off and this time corrects it as it's within a reasonable window.
And then the same thing happened again three weeks later on the same server, only differing in details. In the mean time, the server had both been rebooted and updated with a new monthly update.
We're using VMWare, configured with two physical hardware clocks. We have two domain controllers configured to use pool.ntp.org -- should probably be moved to our own stratum 0 hardware, although it's probably not related to our issues.
With the help from a few external experts, we have pretty much excluded erroneous configuration, manual intervention (by mistake, security breach or disloyal employee) and hardware issues, and we're left with "strange Windows bug".
Unfortunately, 2016 doesn't include much details related to these events, so it's difficult to debug further. 2019+ includes more information.
@chris1out in Windows Server 2022 Time Service Jumping into the future had the same issue for servers not enrolled in a domain, so we can probably rule out the domain controller. It was also using the standard time server and not pool.ntp.org. This means we can probably rule out those two too. This pretty much leaves a bug in Time-Service as the probable cause. This serverfault question is the only documented event of this we're able to find.
TL;DR: We have found the most likely root cause: W32time Secure Time Seeding which looks at the legacy "time" value in SSL handshake headers, which is random in newer SSL implementations, interprets it as the correct time, and sets the clock accordingly.
It can be turned off by setting the UtilizeSSLTimeData registry key:
reg add HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\w32time\Config /v UtilizeSslTimeData /t REG_DWORD /d 0 /f
And w3tm can be instructed to reread it's configuration:
W32tm.exe /config /update
The longer answer...
Microsoft shipped a feature called Secure Time Seeding in November 2015 which is is included in Windows Server 2016+ and turned on by default.
It's an attempt to mitigate the problem where a system has no power to drive the system clock (e.g. a power failure and bad CMOS battery) and thus have a completely wrong time on boot, unable to securely communicate with other sources to reliably get the correct time.
During outgoing SSL handshakes, it looks at "ServerUnixTime" (probably
gmt_unix_time
in the specification).The TLS 1.2 specification says the following (emphasis mine):
Microsoft's initial blog says the following in their post (emphasis mine):
It shows they have misinterpreted the specification and their implementation is designed and operates under wrong assumptions. Their implementation might have worked fine when the world was still using older and more insecure implementations, but as more and more servers are updating, the premise is completely wrong.
At least they don't trust a single source, but as less sources provides the time and more sources use random values, it's probably only a matter of time before the 4 bytes are similar enough to confuse their algorithm.
They further describe that they are using statistical methods to see when they can interpret the random bytes as a correct time (emphasis mine):
The
gmt_unix_time
field was discussed on the TLS mailinglist in September 2013, two years before Microsoft shipped this feature, implemented in OpenSSL October 2013, and shipped January 2014.u/zanatwo found this issue March 2017 and reported it on r/sysadmin, but there's no indication Microsoft knows about this issue as it's still enabled.
It was rediscovered by u/Thranx January 2022 and also reported on r/sysadmin.
And again starting beginning of this year by @chris1out on ServerFault.
As we and others are experiencing, the issue is increasing in frequency, probably due to less servers reporting the actual time and just random values. It's probable that this will continue to increase in frequency and hit more users.
The system clock is the most important shared mutable state on the system, and bugs which change the time to a wildly different value wreaks havoc on all systems and have repercussions far beyond the single server it happens on. This is without a doubt the most serious bug/misfeature I've ever encountered, and Microsoft needs to disable this ASAP.
Getting in touch with Microsoft is difficult, so please report this issue to Microsoft if you're experiencing the same issue.
Thanks a lot to @test-is-prod for sharing their findings and pointing me to the Reddit post by /u/zanatwo!
References: