I have been reading about KVM
and Qemu
for sometime. As of now I have a clear understanding of what they do.
KVM supports hardware virtualization to provide near native performance to the Guest Operating sytems. On the other hand QEmu emulates the target operating system.
What I am confused is to what level these two co-ordinate. Like
- Who manages the sharing of RAM and/or memory?
- Who schedules I/O operations?
Qemu:
QEmu is a complete and standalone software of its own. You use it to emulate machines, it is very flexible and portable. Mainly it works by a special 'recompiler' that transforms binary code written for a given processor into another one (say, to run MIPS code on a PPC mac, or ARM in an x86 PC).
To emulate more than just the processor, Qemu includes a long list of peripheral emulators: disk, network, VGA, PCI, USB, serial/parallel ports, etc.
KQemu:
In the specific case where both source and target are the same architecture (like the common case of x86 on x86), it still has to parse the code to remove any 'privileged instructions' and replace them with context switches. To make it as efficient as possible on x86 Linux, there's a kernel module called KQemu that handles this.
Being a kernel module, KQemu is able to execute most code unchanged, replacing only the lowest-level ring0-only instructions. In that case, userspace Qemu still allocates all the RAM for the emulated machine, and loads the code. The difference is that instead of recompiling the code, it calls KQemu to scan/patch/execute it. All the peripheral hardware emulation is done in Qemu.
This is a lot faster than plain Qemu because most code is unchanged, but still has to transform ring0 code (most of the code in the VM's kernel), so performance still suffers.
KVM:
KVM is a couple of things: first it is a Linux kernel module—now included in mainline—that switches the processor into a new 'guest' state. The guest state has its own set of ring states, but privileged ring0 instructions fall back to the hypervisor code. Since it is a new processor mode of execution, the code doesn't have to be modified in any way.
Apart from the processor state switching, the kernel module also handles a few low-level parts of the emulation like the MMU registers (used to handle VM) and some parts of the PCI emulated hardware.
Second, KVM is a fork of the Qemu executable. Both teams work actively to keep differences at a minimum, and there are advances in reducing it. Eventually, the goal is that Qemu should work anywhere, and if a KVM kernel module is available, it could be automatically used. But for the foreseeable future, the Qemu team focuses on hardware emulation and portability, while KVM folks focus on the kernel module (sometimes moving small parts of the emulation there, if it improves performance), and interfacing with the rest of the userspace code.
The kvm-qemu executable works like normal Qemu: allocates RAM, loads the code, and instead of recompiling it, or calling KQemu, it spawns a thread (this is important). The thread calls the KVM kernel module to switch to guest mode and proceeds to execute the VM code. On a privileged instruction, it switches back to the KVM kernel module, which, if necessary, signals the Qemu thread to handle most of the hardware emulation.
One of the nice things of this architecture is that the guest code is emulated in a posix thread which you can manage with normal Linux tools. If you want a VM with 2 or 4 cores, kvm-qemu creates 2 or 4 threads, each of them calls the KVM kernel module to start executing. The concurrency—if you have enough real cores—or scheduling—if not—is managed by the normal Linux scheduler, keeping code small and surprises limited.
When working together, KVM arbitrates access to the CPU and memory, and QEMU emulates the hardware resources (hard disk, video, USB, etc.). When working alone, QEMU emulates both CPU and hardware.
Qemu is a processor emulating virtualization software with many virtual devices support (such as HDD,RAM,sound,ethernet,USB,VGA , etc.)
KVM is a kernel module which allows passing through CPU cores via host-passthrough without virtualizing them. It also allows passing through PCI devices via vfio-pci kernel module.
All these passthrough functionality are possible via IOMMU (Input output memory mapping unit), which maps real DMA addresses to virtualized addresses so direct access becomes possible and it brings bare-metal (native) performance. IOMMU is a mechanism which is part software in kernel and part hardware in chipsets, featured as VT-D (vmx) AMD-VI (svm). SR-IOV is a chipset feature which allows splitting one PCI device to many virtual ones without performance drop via parallelized direct IO access.
Libvirt is a library, allowing you to use python and other programming languages to configure virtual machines. Virsh is a toolkit which works in terminal to monitor and configure virtual machine settings. Virt-manager is VMware player like GUI as an alternative to virsh and it uses libvirt.
Qemu-img is a cli tool which creates, converts, snapshots disk images. Qemu-nbd is also a CLI tool which allows raw I/O access to virtual disk through network via nbd. Virtio is the iommu access driver and method name to disks, NICs (ethernet) and video. Virgil is OpenGL supporting virtio VGA. Redhat and Fedora has virtio driver ISO CD ROM images for windows and Linux in their websites.
OVMF is open virtual machine firmware which provides UEFI boot image for qemu virtual machines. Spice is a very fast VNC client for qemu virtual machines.
You can start fiddling by entering these in terminal of Ubuntu or any Debian:
Experience brings clarity about the functions of these semi-conceptual terminologies by introducing the realization of the answer to the question "What wouldn't be possible without X ?".