Running ESX server 3 on four identical 4-CPU hosts, guests on Fibre SAN VMFS.
Guest OS is Fedora 10. Cloned it to create web, jboss, mysql, and memcached templates. Cloned each template into four guests, one for each server.
Out of these 16 guests, one jboss and one mysql guest run so slowly as to be unusable. By "slowly", I mean that no matter how CPU-intensive the process launched, they never utilize more than ~200Mhz CPU. Moving them between hosts has no effect--it appears to have something to do with those guests themselves.
BUT! Today I found that they will run at nearly full speed if I either:
- Hold down the spacebar in the console
- Open an SSH session and, hold down some repeating key
- Flood them with ICMP packets
In other words, any kind of I/O activity seems to "wake them up", and all processes run at perfectly normal speeds during this time. Stop that I/O activity, and they again slow to a crawl. So apparently, their processes aren't being scheduled unless there is some sort of interrupt activity.
Any ideas why?
All guests are fully patched as of today. openvm-tools is installed, guest time sync enabled, kernel parameters are "notsc" (but changing that doesn't affect this issue).
Have used rsync in --dry-run mode to verify that /bin, /usr/bin, /var/jboss and /var/lib/mysql are identical to the normally-behaving guests, and that /etc only varies in hostname, IP address, and other instance-specific settings.
Have tried setting their resource utilization to "high" with no effect. (All guest resource utilizations are "normal", except for memory reservations on all JBoss and MySQL guests. The total memory reservations per server are about half of the host memory, and all guest memory sizes added together only use about 70% of the host memory.
The *.vmx, *.vmxf, and *.vmdk files vary only in uuid, displayName, MAC address and disk/swap filenames.
One of the other guests on the same host improperly had the CPU affinity bits checked for all cores on the host. Removing the affinity settings restored normal operation.
There's five of us working on those hosts, I should have checked the configs myself before posting.
EDIT: I can't accept my answer and close the question for 2 days?
Have you looked as esxtop from the service console? It might give you some clues as to what is going on.