A couple of days ago we stumbled upon a disturbing issue with a relatively newly installed ESXi 5 management host for VDI. We were preparing a base VM for Linked-Clone deployment and when accessing its admin share from another machine ("\vm\c$") the entire Management network locks up. We can browse for a bit but after digging through a few folders Explorer hangs. The host and all other VMs inside of it are completely unreachable from the vSphere Client. If I physically walk over to the ESXi server I can login and reboot it and it will come back just fine. I can reliably crash it with any Windows-based VM (7 and 2008R2) 99% of the time. Today, I experimented with different physical ports on the server (there are 4) and found that once it crashes on a port, moving it to another and restarting the Management Network gets me back in, but if I fire up a share remotely I can crash that port, too. A reboot clears it all up.
I've combed through the logs on the server and haven't turned up anything of use. Any ideas?
After about an hour with VMware support we got down to the bottom of the issue. There is a known bug with Broadcom's Ethernet driver and VMware. By disabling NetQ the problem has, so far, gone away. I still see a few second delay when browsing into certain folders over the network, but it eventually loads and doesn't crash the NIC.
Count up the Broadcom/tg3 NICs (4 in our case).
Reboot the host and you're done.