We've got a KVM host system on Ubuntu 9.10 with a newer Quad-core Xeon CPU with hyperthreading. As detailed on Intel's product page, the processor has 4 cores but 8 threads. /proc/cpuinfo and htop both list 8 processors, though each one states 4 cores in cpuinfo. KVM/QEMU also reports 8 VCPUs available to assign to guests.
My question is when I'm allocating VCPUs to VM guests, should I allocate per-core or per-thread? Since KVM/QEMU reports the server has 8 VCPUs to allocate, should I go ahead and set a guest to use 4 CPUs where I previously would have set it to use 2 (assuming 4 total VCPUs available)? I'd like to get the most possible out of the host hardware without over-allocating.
Update: Chopper3's answer is undoubtedly the right approach. However, I'd still love to hear from any hardware experts out there who could elucidate the performance aspects of threads vs. cores... anyone?
Set the lowest number of vCPUs your servers need to perform their function, don't over-allocate them or you could easily slow down your VMs.
Typically, HT works well on workloads that are heavier on IO -- the CPU can schedule in more processing tasks from the queue of the other virtual CPU while the first virtual CPU waits on the IO. Really all the HT subsystems get you is hardware-accelerated context switching -- which is the workload pattern that's also used when switching between VMs. So, HT will (usually) reduce the slowdown a bit when you have more VMs then cores, provided each VM gets one virtual core.
Assigning multiple vCPUs to a VM can improve performance if the apps in the VM are written for threading, but it also makes life harder for the hypervisor; it has to allocate time on 2 or 4 CPUs at once -- so if you have a quad-core CPU and a quad-vCPU VM, only one VM can get scheduled during that timeslice (whereas it can run 4 different single-vCPU VMs at once).
This is rather tricky. Depending on the loads, HT can increase performance by ~30% or decrease it. Normally I advise not to allocate more vCPUs than you have physical cores, to a single VM, but if the VM is rather idle (and of course, such a VM will not really require too many CPUs), it can be given up to as many vCPUs as you have threads. You don't really want to give a single VM more vCPUs than you have schedulable cores is what I'm getting at. And in any case, @Chopper3's advice is right - don't give the VM more v-CPUs than it absolutely requires.
So, depending on how loaded and critical your VMs are, you either don't overallocate at all, stick to the physical core count, or go as high as the thread count per VM.
Now, getting into the question of HT, it is generally a good thing to have, especially when you commit more vCPUs to your VMs than you have physical cores or even threads, because it makes it easier for the Linux scheduler to schedule those vCPUs.
One last thing, with kvm a vCPU assigned to a VM is just a process on the host, scheduled by the Linux scheduler, so all the normal optimizations you can do here easily apply. Moreover, the cores/sockets setting is just the way this process will be displayed for the VM's guest OS, on the host it's still just a process, regardless of how the VM sees it.
I think to elaborate on Chopper3's answer: if the systems are mostly cpu-idle, don't assign a bunch of vcpu, if they are cpu-intense, be very careful to not overallocate. You should be able to allocate a total of 8 vCPU without contention. You can overallocate, but if you do, make sure no single guest, especially a CPU-intensive guest, has 8 vcpu, or you will have contention. I don't know the KVM scheduler mechanism to be more specific than that.
The above is based on the following understanding of vCPU versus pinned CPU, but also the assumption that KVM will allow a single guest (or multiple guests) to hog all the actual CPU from others if you allocate it(/them) enough threads. vCPU ~ host thread, guest CPU CPU = host Core, guest CPU (Haven't played with mixed vCPU and pinned CPU on the same guest, because I don't have Hyperthreading.)