We have a large Xen deployment running on both RHEL and CentOS and have recently started looking at KVM since this is where it looks like the future of VM's are on Linux.
We can load the server and get everything running without an issue. However when we load up a new one with JBoss (4.2 Community edition, Sun JDK 6) and load a large EAR the server goes a little crazy. The %sy will jump to 80-99% and just hang for large periods of time we see a similar jump in %us on the host machine. We though at first this might be I/O as it seems to happen at start of JBoss but then would "cool down" after everything got loaded. We did some tests by extracting some large tar.gz files and using jar -xvf on the ear but could not re-create.
Then we starting thinking this might be some type of memory access issues. We loaded a c-program that would generate a lot of memory activity and sure enough we saw the spikes again. Not as high mind you but we did see it jump. We then wrote a small java program to do the same thing and sure enough we saw it jump again.
Any thoughts on what might be causing this? Is this just the way KVM works?
As a side note we do NOT see this behavior on any other setup. Xen, VMWare and/or native iron. The system does seem a bit slower than our Xen / VMware ones.
A question and a suggestion:
What file system re you using? On Fedora 12 & 13 systems I have seen excellent performance using ext4, but abysmal performance using btrfs.
Extending on @Ophidian comment - try it with Fedora 13 to see how it runs using recent KVM and libvirt libraries.
I'd like to see the testing code you used to recreate this, if possible. I'm testing a lot of KVM guests all the time, and an extra benchmark is always good, especially if it's known to be causing problems
Knowing the VM config will help as well - how much RAM, CPUs, disk and network types etc
I wonder if you're somehow hitting swap on the host machine.
Have you / could you try this on Fedora 12 to see if the more recent versions of everything (kvm, libvirt, kernel, ksm) exhibit this same behaviour? KSM was added in Fedora 12 and purportedly provides much more efficient/better memory management.
In addition to 3dinfluence's suggestion to contact Red Hat support and the KVM mailing list, I would also suggest the
fedora-virt
list as a large number of developers for libvirt, kvm and the rest of the Fedora virtualization ecosystem are regular readers of thefedora-virt
. They are generally very responsive and helpful.Are you running the ksmd (the shared memory de-duper)? This can have a bit of CPU overhead on the hypervisor system, and could conceivably be involved in memory-related stalls.
It may also be worth seeing if pinning the VM to a core helps - you may find that the guest being switched from core to core (or across processors) is causing cache flushes under memory-intensive workloads that screw things up.
Bearing in mind that every VM is a process as far as the hypervisor is concerned it could be illuminating to attach strace to the guest process and see if there are any syscalls that are obviously problematic.
Finally, I've been finding Dag Wieer's dstat a nice addition to my toolkit for tracking down problems.
(Since we're about to look at piloting JBoss-under-KVM this is relevant to my interests...)