I've got a development web server environment consisting of 2 Windows 2012 VM's configured with NLB (Multicast mode.) They are each using a single network adapter with 2 IP's assigned: the shared NLB IP, and their non-shared individual IP. Our usage pattern is to run builds and deployments to one of the VM's while it is held out of the active NLB pool. Once the build is done, the freshly updated server is added back into the pool, and the other one is removed. This way we maintain constant uptime.
My problem is that on 1 of the 2 servers, IIS will not allow connections from IP addresses outside our network through NLB. So if that server is the only active one in the NLB pool, internal PCs can connect, but hosts outside our network can't connect. They get a timeout. I can see the TCP SYNs using Wireshark, and no ACK ever gets sent back. If I flip the NLB config to the other server, it allows both internal and external IPs to connect. And I verified this from an outside host that I control: I attempted an HTTP request using the NLB IP and got the same result.
Things I've checked:
- As far as I can tell, the NLB and IIS configuration/bindings are the same on both servers.
- There are no IP filtering rules in IIS.
- Windows Firewall is turned off.
- I tried switching to Unicast NLB mode, but that broke everything.
- I tried changing the NLB host priority to 4 and 3 instead of 1 and 2, with no effect.
- I also eliminated host headers and SSL from the equation by reproducing the problem with just a simple request by IP address on port 80.
How can I troubleshoot why IIS is not ACKing the SYN from outside IPs? Can it log this sort of thing?
If I can't troubleshoot why IIS is not allowing the connection on Server 2, my only recourse will be to rebuild the whole VM from scratch and try again...
One of our server admins found the issue: The problem VM was missing the correct default gateway IP address in its network adapter settings. This would seem to explain why local requests were OK but requests from outside IPs were getting dropped.
For future reference, a good test in this situation would be to check that the VM can initiate outbound connections to external IPs, e.g. fire up a web browser and go to google.com. I have a feeling that would have also failed while it was in the bad state, and would have been easier to troubleshoot.