I had two of my CPUs lock up on one of my servers. From dmesg
:
BUG: soft lockup - CPU#1 stuck for 23s! [vmx-vcpu-0:6148]
and later:
BUG: soft lockup - CPU#2 stuck for 23s! [vmx-vcpu-0:6148]
I'm trying to figure out why this would happen; the processor has 4 cores with hyperthreading, so the OS sees it as 8 cores. But my main question is related to this:
When looking at htop
post-freeze from SSH, I see that CPUs #2 and #3 (guessing these correspond to #1 and #2 from dmesg) are both stuck at 100% with apparently no processes using them:
None of the processes were using more than 5% CPU. Why would these display 100% utilization? Are they still considered locked by the kernel?
As the message reports, this a bug in kernel-level code.
Those CPUs are stuck in a kernel code (vmx-cpu-0) that is not yield()ing control of the CPU for a long period of time.
As far as what to do - open a ticket with VMware.
vmx-cpu-0
looks like their code, but I'm not totally sure.