I have a Debian 6 Xen guest that seems to go to sleep from time to time. Randomly, it just stops to answer to any network requests (HTTP, ssh, ping) and only resumes activity when we log on the console. The server is clearly not crashed, however during this sleeping time no activity happens, even all the logs (syslogd and klogd) remain blank during this time.
Depending on when it happens and when we can actually log on the console, a few minutes but sometimes even an hour can be spent into this mode. This behavior happens irregularly, about once a month, randomly.
I don't have access to the console nor to the Xen host myself, but the support team from the hosting company is saying nothing suspicious is shown. They say it's the only guest on their infrastructure exhibiting this behavior.
The guest runs a linux 2.6.29.6 kernel compiled by the hosting company, has 2 cores, 4 GB of RAM and 2 GB of swap. 5-mins average load is not low (between 2 and 3, with peaks up to 5), but the swapping activity is low (swapin/swapout) and the swap space barely used. No kernel messages are spotted in the logs, nor in the dmesg output.
This server is running regular apache + mod_php and proftpd, really nothing fancy. AFAICT we've not tweaked any clock related parameters of the kernel (however I'm not sure how I can check the kernel setup if an energy saving mode or clock stepping is activated or not).
We're running out of clues at where the issue comes from.
Edit: I've run find /var -mmin -beforeevent -mmin +afterevent
to try to find any file being modified during the last time the server hang and all that find reported was file being modified just before or just after the event, but nothing in between, even when it was a 1 hour long hang. This server only have a single partition, so it's not like only the disk containing /var was down.
I also have other hosts on the same subnet and all see this server as being offline: snmp polling fails and no requests are logged on the DB host from any PHP application running on the sleeping server.
We also tried to setup some cronjob to do continuous activity (like pinging some other host continuously), that didn't prevented this server to enter this sleep mode.
For what it's worth, I suspect this issue was related to no ntp usage in the VM. The VM time drifted away from the host time and probably caused the server to enter sleep mode.
After installing and using ntpd, I had no other similar incident. However, I don't have this exact server anymore and have not been running with ntp enabled for a very long time (only 2 or 3 months). Hence the reason why I cannot say it was the very solution to this problem.