I'm running a hundred Ubuntu 16.04 LTS servers with basically identical hardware distributed worldwide. (I'm working on upgrading them to 20.04 LTS but certain unfortunate design decisions on the part of Ubuntu are still blocking this.) Each of these servers is running a KVM VM with Windows 10 Enterprise. Three of them show the following problem:
Without any apparent cause, monitoring shows the server's Linux load average jumping up to above 2. top
shows the CPU load from the qemu-system-x86
process running the Windows VM solidly at 200%, matching the 2 cores assigned to the VM. The Windows desktop accessed through VNC appears extremely sluggish. Windows Task Manager shows a process "System interrupts" consuming 100% CPU.
Rebooting the Windows VM does not fix the situation. It persists for several hours or even days and then goes back to normal by its own, again without any apparent cause or reason.
Researching reasons for high CPU usage by "System interrupts" in Windows turns up a general consensus that this is a hardware issue. The hardware running Windows in this case is virtual, namely the KVM hypervisor. The physical hardware of the hosts did not change before or after the high load episodes, nor does it differ significantly between the servers that show these episodes and those that don't. The Linux host system does not show any signs of malfunction except the excessive load from the Windows guest. Inspection of the Linux logs on the affected systems has turned up nothing unusual. The Windows event logs show the obvious heaps of secondary errors during the high load episodes, such as services not responding, but nothing indicating a possible cause.
Where would I begin to look for possible causes of that behaviour?
For the sake of completeness, this is my KVM invocation:
kvm \
-daemonize \
-name "$vmname64-$(hostname)" \
-drive file="/srv/kvm/${vmname64}.qcow2",if=virtio \
-net nic,model=virtio,macaddr=$macaddr64 -net tap \
-vga std \
-rtc base=localtime \
-usb -usbdevice tablet \
-nodefaults \
-runas srvadmin \
-chroot /home/srvadmin \
-k de \
-smp 2 \
-m 4096 \
-vnc :1,password \
-monitor mon:telnet:127.0.0.1:4445,server,nowait
0 Answers