Ping a Specific Port

Question

wfaulk

Asked: 2010-08-14 11:47:30 +0800 CST2010-08-14 11:47:30 +0800 CST 2010-08-14 11:47:30 +0800 CST

Regular system hiccups on RHEL5 workstation

772

I have a RHEL5 workstation that has recently started to "hiccup". About every thirty seconds, it apparently completely stops execution for about 4 seconds. Seemingly nothing runs during that period. Long term processes seem to catch up to their input, but new processes simply don't get started.

Concrete examples:

I have this loop running in a shell:

while date; do
   sleep 0.2
done

Output merely skips over the missing seconds:

Fri Aug 13 15:20:29 EDT 2010
Fri Aug 13 15:20:29 EDT 2010
Fri Aug 13 15:20:29 EDT 2010
Fri Aug 13 15:20:30 EDT 2010
Fri Aug 13 15:20:30 EDT 2010
Fri Aug 13 15:20:30 EDT 2010
Fri Aug 13 15:20:30 EDT 2010
Fri Aug 13 15:20:34 EDT 2010
Fri Aug 13 15:20:34 EDT 2010
Fri Aug 13 15:20:35 EDT 2010
Fri Aug 13 15:20:35 EDT 2010
Fri Aug 13 15:20:35 EDT 2010

If typing in a terminal, either local console or remote via ssh or telnet, echoback pauses during the unresponsive time, but catches back up when it starts responding again, with apparently no loss of input, just lag.

pings go unresponded-to during the unresponsive time, but are responded to when it comes back:

64 bytes from xxx: icmp_seq=1911 ttl=64 time=0.203 ms  
64 bytes from xxx: icmp_seq=1912 ttl=64 time=0.199 ms  
64 bytes from xxx: icmp_seq=1913 ttl=64 time=3202 ms  
64 bytes from xxx: icmp_seq=1914 ttl=64 time=2196 ms  
64 bytes from xxx: icmp_seq=1915 ttl=64 time=1197 ms  
64 bytes from xxx: icmp_seq=1916 ttl=64 time=195 ms  
64 bytes from xxx: icmp_seq=1917 ttl=64 time=0.201 ms  
64 bytes from xxx: icmp_seq=1918 ttl=64 time=0.206 ms

This would seem to imply that it is actually receiving input during the unresponsive period, as those ICMP packets are not being retransmitted.

vmstat 1 output also delays, but does not catch up. It's almost as if those few seconds didn't happen. It also shows an uptick in waiting processes, and a downtick in interrupts and context switches:

procs -----------memory----------  ---swap-- -----io---- --system-- -----cpu------
 r  b   swpd   free   buff  cache    si   so    bi    bo    in   cs us sy  id wa st
 0  0    132 3111220 305540 588012    0    0     0     0  1035  151  1  1  99  0  0
 0  0    132 3111096 305540 588012    0    0     0     0  1019  125  0  0  99  0  0
 0  0    132 3111220 305540 588012    0    0     0    44  1034  154  0  1  99  0  0
 1  0    132 3111096 305540 588012    0    0     0     0  1016  131  0  0  99  0  0
 6  0    132 3111096 305540 588012    0    0     0     0   417   82  0  0 100  0  0
 0  0    132 3111220 305540 588012    0    0     0     0  1041  155  0  1  99  0  0
 0  0    132 3111096 305540 588012    0    0     0     0  1019  123  1  1  99  0  0
 0  0    132 3111220 305540 588012    0    0     0     0  1032  142  0  1  99  0  0
 0  0    132 3111096 305544 588008    0    0     0    44  1019  134  0  0  99  0  0

Rebooting makes the problem go away for a while. This most recent time it took six days to come back. I'm not sure if that's consistent or not.

I had initially suspected that the problem might be related to the nVidia video driver module, but I shut down X Windows and removed the module, without change in the symptoms.

There is nothing in dmesg or /var/log/messages that seems remotely relevant or in any way coincides with the hiccups. It does not appear to be an issue with a hard drive, as I would expect iowait to be prominent during the unresponsive period if that were the case, but it's not. It feels unlikely to be a hardware problem, as the hiccups are pretty regular. I've been unable to time them down to milliseconds, but it's a pretty consistent 30/4/30/4/30/4.

Any ideas?

2 Answers

Voted

Christopher Karel · Answer 1 · 2010-08-14T12:30:33+08:00

Christopher Karel

2010-08-14T12:30:33+08:002010-08-14T12:30:33+08:00

My money still goes on a hard disk failure. I've had similar things occur in personal Windows desktops. And even an old Sun machine exhibited similar freeze issues. However, I won't claim I dug deep enough into the issue to notice the seconds dropping from a sleeping shell. Regardless, you might want to see if you can get any info out of your RAID controller, or otherwise rule out the harddisks.

2

guettli · Answer 2 · 2011-08-04T01:05:26+08:00

guettli

2011-08-04T01:05:26+08:002011-08-04T01:05:26+08:00

My server has hiccups, too. I found this tool: http://www.latencytop.org/. Unfortunately my hiccups are not occurring regularly.

1

Regular system hiccups on RHEL5 workstation

Ping a Specific Port

How do I tell Git for Windows where to find my private RSA key?

How do you restart php-fpm?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Resolve host name from IP address

How can I sort du -h output by size

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?