I’m working in a large data center and have been assigned to troubleshoot and issue with a windows (IIS) web server that acts as a portal for a customer of the data center. This portal server is on a DMZ at the local data center.
I don’t have access to the portal desktop and am relying on an off-site administrator to work with me to do testing and report the condition of the portal. He tells me there are no software firewalls or other filtering configured.
While most of the remote web pages work fine, several of the URLs the portal is suppose to serve up fail to load. I had wireshark installed on the portal system and had a capture taken of one of the failures. I used IE to access one of the remote web servers at issue. I could see the TCP SYN-ACK coming back from the remote server, but after several HTTP GETs fail to get a response the portal server sends a reset.
(response to answer 1: From the capture taken outside the firewall;
Internet Protocol,
Version: 4
Header length: 20 bytes
Differentiated Services Field: 0x00 (DSCP 0x00: Default; ECN: 0x00)
0000 00.. = Differentiated Services Codepoint: Default (0x00)
.... ..0. = ECN-Capable Transport (ECT): 0
.... ...0 = ECN-CE: 0
<snip>
Transmission Control Protocol
<snip>
Flags: 0x18 (PSH, ACK)
0... .... = Congestion Window Reduced (CWR): Not set
.0.. .... = ECN-Echo: Not set
It appears ECN is disabled.)
The webmaster of the remote web server assures me that no sites are being blocked. I had a capture taken outside the local firewall, so there should be no issue there.
Another tech set up a laptop and used the IP address of the portal (we took the portal off-line for the test). The laptop loads the URL as expected. I tried having Firefox loaded to make sure that the HTTP GET was not mal-formed. Same failure as with IE.
So, it seems it is not the remote web server or the network, because there was no problem with the laptop.
At this point, I’m not sure what other questions to ask or tests to do.
disable ECN (blah blah blah padding because the message was too short)
unfortunately there's to less information to "analyze" the problem remotely or suggest something. There's a chance to get help from one of my colleagues in the US: www.wildpackets.com. They can ether consult you, provide you with an evaluation for of our software or send someone onsite to do the job.
Best regards, Linus
The ultimate resolution to this problem was found in the NIC settings on the portal server. Specifically, the TCP parameters in the registry. I did a wholesale reset using netsh (netsh int ip reset resetlog.txt). These parameters were identified as being of most interest;
"TcpMaxDataRetransmissions"=dword:0000000a "DefaultTTL"=dword:00000040 "Tcp1323Opts"=dword:00000003 "TcpWindowSize"=dword:00a00000
(Remove or change. This is currently a HUGE, 10MB window size. If you want to leave this one parameter hardcoded set it to FAF0 (just under 64240 bytes)).
"GlobalMaxTcpWindowSize"=dword:00a00000
After performing the netsh operation the URL that was failing worked as expected.