I have a HyperV cluster made up of 3 hosts. Each host is connected to both of my Nexus 5548 switches running in an etherchannel. LACP on the switch and NIC teaming using Broadcom 802.3ad on the server side. This gives me 2GB of bandwidth and also provides fault tolerance.
The problem I am having occurs when I perform a live migration. Before the live migration both Nexus switches show the MAC of the VM in the ARP table. After the migration one switch shows the MAC of the VM and the other shows the MAC of the HyperV host which it moved to.
I ran a packet capture and saw the HyperV host send a gratuitous ARP with the IP of the VM and the MAC of the host instead of the MAC of the VM. I lose layer 3 connectivity when this happens. I have to manually clear the ARP entry from the switch or wait about 7 minutes for it to correct itself.
I did some looking around and people are having similar issues when dealing with NIC teaming using Broadcom. Has anyone seen this? Any advice?
-------- Edit added below
I am only having this problem when teaming using Link Aggregation 802.3ad. The Broadcom teaming options are...
- Link Aggregation (802.3ad)
- Smart Load Balancing (TM) and Failover
- SLB (Auto-Fallback Disable
- Generic Trunking (FEC/GEC) / 802.3ad-Draft Static
I switched to Smart Load Balancing and the VM Live Migrates without losing any network connectivity. However, the ARP tables on the Nexus switches are in sync but they show the MAC address of the Host and not the VM. This is opposite of what I thought it would do. Shouldn't the ARP tables of the switches show the MAC of the VM? If not and they are suppose to show the MAC of the host, why?
Okay. After 3 weeks of intense battle I finally got it all figured out.
I opened a case with Broadcom support and after going back and forth for a few days here is the response I received from a Broadcom software developer.
Windows Registry Editor Version 5.00
[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\ Blfp\Parameters\1] "HyperVMode"=dword:00000001
So with that said I gave it a try. First, I applied created the registry key. Then, I changed the Virtual Switch to Private Virtual Machine Network mode. I gave the server a quick reboot. Finally, when the server came back online I configured the Virtual Switch back to External Mode and choose the BASP Virtual Adapter. I tested Live Migrations and everything worked perfectly and the ARP tables in the Nexus showed the VM_IP and the VM-MAC. YEY!!!!
Here is the setting within SCVMM. I was having issues like you until I turned on Trunk mode for the Host. If you right click on the host in SCVMM --> Properties --> Networking tab