We bought some software from a small'ish company, it's a Windows 32-bit video content workflow manager, there's been some customisation by them.
We've been working fine for over a year running this code in a VMWare ESXi 4.1u2 VM on W2K3EE-32-bit (that's what they support running it on).
Then they updated their code a month or so back and we started seeing one of the vCPUs periodically pegging at 100%, the second vCPU is fairly idle, say 5-7% - so we just assumed that the code's badly threaded and contacted them about it.
They've now come back to us saying that their code doesn't work in a VM, they've known about this requirement for 18 months or so, and that they want us to V2P it. They say they only see this problem when ran inside VMs. I've a call with their senior programmer scheduled in a few hours to discuss.
Now luckily we have a few physicals that we can do this on, bit time-consuming but do'able.
My question however is that given this VM doesn't touch any hardware directly, is on a very modern host and actually has very low requirements (2 x vCPU, 4GB, 20GB boot vdisk, 100GB data vdisk, single vNIC and nothing else) what could possibly be the issue with running it in a VM, if there is one?
Obviously I'm strongly pursuing this with them but I just wondered if anyone else has found a regular application, that somehow misbehaves inside a VM but not on a physical.
While I can't speak for this vendor or the software package, I have worked for a large (multinational) vendor, where one of the pieces of software they sold had very specific known issues when running on VMware.
In this case, one issue could cause the software to deadlock, and the other could cause data corruption. As such, customers were advised not the run the software in a virtual environment. Some still did, and in all the cases I was aware of, they ran into one or both of the problems.
So while it is rare, there can be cases where software does not perform as you would expect it to in VMware.
While I realise it doesn't directly help your problem, it does show that VMWare is not always the perfect system.
Footnote: in this case the vendor was able to work with VMware to find resolutions (some code fixes, some VMWare config changes), and they now have some (very specific) guidance on how to run the software on VMWare.
With ESX v5 and the Monster VM limit (32vCPU 1TB RAM), the number of applications having issues with VM is shrinking. Most of the ones I've experienced are either : - relying on time to be linear (realtime processes or apps that needs to have linear time ... this can usually be tweaked) - apps causing lots of hardware interrupts or context switching
In most cases, you should be able to ask your vmware rep to talk to those guys. I believe vmware still has a team of people dedicated to make things work (they had a support lab just for this in the early days).
As for a solution, I had a similar issue with VM having high CPU usage (but host having plenty of CPU resources free). We fixed the issue by migrating to a server with a Nehalem CPU and changing the CPU compatibility level in EVC (if you have a cluster with DRS/HA)
I have seen similar problem with VMware ESX + Debian 6 + OpenLDAP 2.4.x (whatever the exact version of OpenLDAP is apt-gettable...).
Under day-to-day operations it works OK, but things like importing a largish LDIF file with 400 000 or so entries are very slow (50-100x slower than with physical servers). Also with long-duration, high-volume benchmarking everything is going smoothly with couple of milliseconds response time, but occasionally there are strange peaks ranging from 500 to 25 000(!) milliseconds.
With physical servers I'm unable to reproduce these problems. And yes, I spent around three weeks trying to isolate the problem, tuning all kind of parameters from operating system parameters to slapd values to BerkeleyDB values ... nothing helped.