I am running IIS 7.5 on Server 2008 R2 virtualized on a Windows Core 2008 R2 server on Intel server hardware sitting behind a Sonicwall firewall.
For a number of months now, we've had a trickle of customers (maybe one per week) contact us to say that they can't access the website. When this happens I immediately start diagnosing the problem and here's what I find:
- I can access the website.
- Our support staff at other locations can access the website.
- Presumably (because we don't hear from them), other customers can access the website.
- The customer can ping and tracert to the server.
- The customer cannot access other websites on the same server that share the same IP address.
- The customer can access other websites on the same server that use different IP addresses.
- iisreset does not resolve the problem.
- Resetting the customer's router does not resolve the problem.
- Flushing our firewall's ARP cache does not resolve the problem.
- Changing the customer's browser and/or rebooting his machine does not resolve the problem.
- Switching to a different computer behind the customer's router does not resolve the problem.
- In 15 - 30 minutes the problem somehow magically resolves itself and the customer can once again access the website.
- When it fails, the customer sees a timeout message and the IIS logs show no record of the request at all.
Other notes:
- There doesn't seem to be a pattern as to which customers this problem affects.
- We are not using load balancing.
- Except for the firewall, there is no other security software/hardware in front of IIS.
- The IIS VM has all the latest Windows Updates.
- The Server Core installation has all the latest Windows Updates.
- The Sonicwall is running the latest firmware.
Things I suspect may be the problem:
- If the customer's browser was incorrectly resolving the DNS for the website that could cause all of the above problems. Next time it happens I'll use Fiddler to verify the IP address the browser is trying to connect to. Not sure why ping would then be able to resolve it correctly from the command line though.
- Perhaps the Sonicwall is somehow blocking the connection. If this is the case, it is blocking only a specific source IP + destination IP + protocol, and only for 15 - 30 minutes. I do not have any of the Sonicwall's advanced filtering services licensed/activated. I can potentially test this theory by resetting the Sonicwall while the problem is happening, which is a bit of a scary proposition considering the other users accessing the server at the same time.
- Perhaps the virtual network connection between Server Core (the host O/S) and Server 2008 R2 (the guest O/S) is somehow blocking the connection for some period of time. Not sure how I can test/diagnose this one.
- Maybe some weird problem with the NIC drivers on the host machine? Not sure how to test this one either.
It's not a very satisfying resolution, but I wound up moving from the virtualized solution I described above to a standalone server and so far the problem has gone away. I don't know if it was a problem with the previous host machine's network card, the virtual network adapter sitting between the VM and the host machine, or something else entirely, but for the moment things are running smoothly. If the problem pops up again I'll update this question/answer.