Ping a Specific Port

Question

Stefano M

Asked: 2013-11-21 05:38:39 +0800 CST2013-11-21 05:38:39 +0800 CST 2013-11-21 05:38:39 +0800 CST

Linux kernel detecting wrong processor frequency

772

After a cold boot of a 6.0.8 Debian server (HP ProLiant), ntpd played havoc with system time: offset and jitter with respect to the usual and reliable reference time servers growing without limit. (Note that a twin identical server had no problem at all.) After many unsuccessful attempts to fix the problem on the ntpd side I decided to try a reboot, and everything went OK.

In order to investigate the problem I found this discrepancy, which could explain my clock problems:

root@n1:~# zgrep Detected /var/log/dmesg*
/var/log/dmesg:[    0.004000] Detected 2400.110 MHz processor.
/var/log/dmesg.0:[    0.004000] Detected 2383.579 MHz processor.
/var/log/dmesg.1.gz:[    0.004000] Detected 2400.036 MHz processor.
/var/log/dmesg.2.gz:[    0.004000] Detected 2400.298 MHz processor.
/var/log/dmesg.3.gz:[    0.004000] Detected 2400.165 MHz processor.
/var/log/dmesg.4.gz:[    0.004000] Detected 2400.410 MHz processor.

Note that in the second last boot (the problematical one) the detected CPU freq is a clear outlier. Without the outlier, error and standard deviation of the detected frequency with respect to the nominal one is +0.15 MHz ± 0.25 MHz. For the problematic boot I have an error of -16.4 Mhz, which is about 100 times greater than expected.

My questions:

Can an error of this type make the ntp time discipline unstable/unusable? Is this the reason for my clock problems?
Is this type of behavior a symptom of flacky hardware? Should the server go into hw maintenance?

Update

Some useful data:

kernel is 2.6.32-5-amd64 (Debian 2.6.32-48squeeze4)
current_clocksource is tsc
error for lpj is (of course) consistent with error on CPU freq

Some context lines for the above grep

[    0.000000] hpet clockevent registered
[    0.000000] Fast TSC calibration using PIT
[    0.004000] Detected 2400.110 MHz processor.
[    0.000008] Calibrating delay loop (skipped), value calculated using timer frequency.. 4800.22 BogoMIPS (lpj=9600440)

2 Answers

Voted

Stefano M · Answer 1 · 2013-11-21T15:35:26+08:00

Stefano M

2013-11-21T15:35:26+08:002013-11-21T15:35:26+08:00

I convinced myself that the problem was a misidentified time stamp counter (TSC) frequency.

Apparently the kernel is calibrating the TSC against the programmable interval timer (PIT). Usually the identified CPU frequency is 2400.204 ± 0.134 MHz, which corresponds to about 56 ppm accuracy. After the problematic boot the CPU freq was estimated as 2383.579 MHz, which corresponds to an error of about 6900 ppm, which ntpd was not able to compensate for. In fact during the first 10h30m of functioning the system clock gained about 4m30s, which is about 7000 ppm.

Since the error in the TSC frequency corresponds to the drift in the system clock I would conclude that the abnormal clock behaviour was caused by a wrong TSC calibration.

However I never saw such a big problem: I'm still wondering about the possible causes (hw, sw?) of this wrong calibration.

5

sysadmin1138 · Answer 2 · 2013-11-21T12:56:35+08:00

sysadmin1138

2013-11-21T12:56:35+08:002013-11-21T12:56:35+08:00

This type of behavior is atypical. A good check would be to monitor the values of the ntp.drift file to see if significant changes happen when the behavior was showing up. If it kept changing significantly, NTP was attempting to skew around a problem. If that was the case, it's a sign that the kernel misidentified the true clock frequency on startup, or the clock itself was slow for the wrong parts of boot. Unfortunately, this one event isn't a clear signal of hardware problems.

If it happens again, watch that ntp.drift file.

3

Linux kernel detecting wrong processor frequency

Can you pass user/pass for HTTP Basic Authentication in URL parameters?

Ping a Specific Port

Check if port is open or closed on a Linux server?

How to automate SSH login with password?

How do I tell Git for Windows where to find my private RSA key?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?