We are having a strange behaviour in our ESX-cluster:
The Infrastructure:
we have 2 ESXi 5.5.0 build 2718055 in a cluster, managed by vCenter. We are using an Essentials licence, so we do not have distributed switches. Our company network has multiple vlans, from which about 10 are needed in vm servers. The hardware is HP DL380 Gen8, 8 1Gb eth-ports. The switch (Cisco 2960E and 3850E) ports connected to the servers are configured with the cisco trunk vlan - all packets arrive with their vlan tag. The physikal networking is completly redundant, one of two switches AND one of two network cards on a server can fail without crashing the VMs.
All switchports are configured the same,
I am using 2 virtual switches (on each host), each switch has assigned
The Problem
When i reboot a vm, placed on esx1 and with automatic ip address configuration, the machine won't get a DHCP connect - the network connection is available, if i set a manual ip address everything works fine, but pconfig /refresh
is haning, and DHCPExplorer does not find a valid dhcp server (which i can ping if i assigne a manual ip address).
Now i have to migrate the machine to esx2 and wait for some time (or do ipconfig /renew
or disable and enable the nic) the machine will get a dhcp address. After that i can move the machine back to esx1, and it will work perfectly fine. After that i even get positive results from dhcp explorer.
I was then testing if the behaviour was connected to the physical part of the network: i removed all physical nics but one from the portgroup with the affected vlan, did some reboots with a dhcp machine, and then tested it with another nic - in short i forced all the traffic from this port group to go through one physical port of the nic and the switch.
The result was: the problem only occours on two different ports on two different nics, but they are both connected to the same switch.
It seems to me as if this switch is somehow blocking access to the dhcp service. Has anyone seen a behaviour like this? I am going out of opptions - soon we want to upgrade to ESX 6, but since we do also have VMWare View Desktop Virtualisation, the upgrade process will include a lot of work and testing and can't be done quickly...
EDIT:
Since the visual config of our switches is too large for the screen, i did an export of the virtual switches and portgroups via powershell.
The problematic host is host-1002, the problematic nics i identified are vmnic4 and vmnic8, the port groups where the problem was observed are PortGroup35 and PortGroup41
Get-Virtualswitch|select Name, ID, NumPorts, NumPortsAvailable, Nic, MTU, VMHostID
RESULT:
Name : vSwitch0
Id : key-vim.host.VirtualSwitch-vSwitch0
NumPorts : 4352
NumPortsAvailable : 4309
Nic : {vmnic7, vmnic0, vmnic2, vmnic9}
Mtu : 1500
VMHostId : HostSystem-host-1001
Name : vSwitch2
Id : key-vim.host.VirtualSwitch-vSwitch2
NumPorts : 4352
NumPortsAvailable : 4309
Nic : {vmnic3, vmnic1, vmnic6, vmnic8}
Mtu : 1500
VMHostId : HostSystem-host-1001
Name : vSwitch5
Id : key-vim.host.VirtualSwitch-vSwitch5
NumPorts : 4352
NumPortsAvailable : 4309
Nic : {vmnic4}
Mtu : 1500
VMHostId : HostSystem-host-1001
Name : vSwitch0
Id : key-vim.host.VirtualSwitch-vSwitch0
NumPorts : 4352
NumPortsAvailable : 4304
Nic : {vmnic7, vmnic3, vmnic5, vmnic9}
Mtu : 1500
VMHostId : HostSystem-host-1002
Name : vSwitch2
Id : key-vim.host.VirtualSwitch-vSwitch2
NumPorts : 4352
NumPortsAvailable : 4304
Nic : {vmnic8, vmnic4, vmnic6, vmnic2}
Mtu : 1500
VMHostId : HostSystem-host-1002
Name : vSwitch5
Id : key-vim.host.VirtualSwitch-vSwitch5
NumPorts : 4352
NumPortsAvailable : 4304
Nic : {vmnic1}
Mtu : 1500
VMHostId : HostSystem-host-1002
Get-Virtualportgroup|select Name, VirtualSwitchId, Key, VLANId, VMHostID
RESULT:
Name : PORTGROUP82
VirtualSwitchId : key-vim.host.VirtualSwitch-vSwitch0
Key : key-vim.host.PortGroup-PORTGROUP82
VLanId : 82
VMHostId : HostSystem-host-1001
Name : PORTGROUP90
VirtualSwitchId : key-vim.host.VirtualSwitch-vSwitch0
Key : key-vim.host.PortGroup-PORTGROUP90
VLanId : 90
VMHostId : HostSystem-host-1001
Name : PORTGROUP83
VirtualSwitchId : key-vim.host.VirtualSwitch-vSwitch0
Key : key-vim.host.PortGroup-PORTGROUP83
VLanId : 83
VMHostId : HostSystem-host-1001
Name : PORTGROUP16
VirtualSwitchId : key-vim.host.VirtualSwitch-vSwitch0
Key : key-vim.host.PortGroup-PORTGROUP16
VLanId : 16
VMHostId : HostSystem-host-1001
Name : Management Network
VirtualSwitchId : key-vim.host.VirtualSwitch-vSwitch0
Key : key-vim.host.PortGroup-Management Network
VLanId : 41
VMHostId : HostSystem-host-1001
Name : PORTGROUP80
VirtualSwitchId : key-vim.host.VirtualSwitch-vSwitch2
Key : key-vim.host.PortGroup-PORTGROUP80
VLanId : 80
VMHostId : HostSystem-host-1001
Name : PORTGROUP41
VirtualSwitchId : key-vim.host.VirtualSwitch-vSwitch2
Key : key-vim.host.PortGroup-PORTGROUP41
VLanId : 41
VMHostId : HostSystem-host-1001
Name : PORTGROUP35
VirtualSwitchId : key-vim.host.VirtualSwitch-vSwitch2
Key : key-vim.host.PortGroup-PORTGROUP35
VLanId : 35
VMHostId : HostSystem-host-1001
Name : VMkernel
VirtualSwitchId : key-vim.host.VirtualSwitch-vSwitch5
Key : key-vim.host.PortGroup-VMkernel
VLanId : 0
VMHostId : HostSystem-host-1001
Name : PORTGROUP43
VirtualSwitchId : key-vim.host.VirtualSwitch-vSwitch0
Key : key-vim.host.PortGroup-PORTGROUP43
VLanId : 43
VMHostId : HostSystem-host-1001
Name : PORTGROUP82
VirtualSwitchId : key-vim.host.VirtualSwitch-vSwitch0
Key : key-vim.host.PortGroup-PORTGROUP82
VLanId : 82
VMHostId : HostSystem-host-1002
Name : PORTGROUP83
VirtualSwitchId : key-vim.host.VirtualSwitch-vSwitch0
Key : key-vim.host.PortGroup-PORTGROUP83
VLanId : 83
VMHostId : HostSystem-host-1002
Name : PORTGROUP90
VirtualSwitchId : key-vim.host.VirtualSwitch-vSwitch0
Key : key-vim.host.PortGroup-PORTGROUP90
VLanId : 90
VMHostId : HostSystem-host-1002
Name : PORTGROUP16
VirtualSwitchId : key-vim.host.VirtualSwitch-vSwitch0
Key : key-vim.host.PortGroup-PORTGROUP16
VLanId : 16
VMHostId : HostSystem-host-1002
Name : Management Network
VirtualSwitchId : key-vim.host.VirtualSwitch-vSwitch0
Key : key-vim.host.PortGroup-Management Network
VLanId : 41
VMHostId : HostSystem-host-1002
Name : PORTGROUP80
VirtualSwitchId : key-vim.host.VirtualSwitch-vSwitch2
Key : key-vim.host.PortGroup-PORTGROUP80
VLanId : 80
VMHostId : HostSystem-host-1002
Name : PORTGROUP41
VirtualSwitchId : key-vim.host.VirtualSwitch-vSwitch2
Key : key-vim.host.PortGroup-PORTGROUP41
VLanId : 41
VMHostId : HostSystem-host-1002
Name : PORTGROUP35
VirtualSwitchId : key-vim.host.VirtualSwitch-vSwitch2
Key : key-vim.host.PortGroup-PORTGROUP35
VLanId : 35
VMHostId : HostSystem-host-1002
Name : VMkernel
VirtualSwitchId : key-vim.host.VirtualSwitch-vSwitch5
Key : key-vim.host.PortGroup-VMkernel
VLanId : 0
VMHostId : HostSystem-host-1002
Name : PORTGROUP43
VirtualSwitchId : key-vim.host.VirtualSwitch-vSwitch0
Key : key-vim.host.PortGroup-PORTGROUP43
VLanId : 43
VMHostId : HostSystem-host-1002
EDIT: NEW INFORMATION
Now i realised, why the problem is only happening on esx1: the dhcp server for these machines is a vm, placed on esx2. So the dhcp requests from machines on esx2 would not even have to leave the virtual switch. If i move the dhcp server to esx1, the problem is solved there and starting on esx2. Still only one switch is affected, the other one is working fine. So in my opinion the problem definitely lies in the physical switch, not the virtual one.
Your switch may have inconsistent spanning tree settings on the different switch ports.
How long are you waiting before you consider this "failed"? Do you have access to the Cisco switch configuration?
Outside of that, it would be good to see your Virtual Switch configuration like this example.
Thanks for updating your question and comments, basically you need to set a 'DHCP Helper' on the specific switch for that port/VLAN.
Basically on the switch do;
enable conf t int {whatever port} ip helper-address {DHCP server IP or cluster VIP}
then test and if successful write your config back to startup.