Issue
I run an IRC server for 20-50 users. We sometimes have issues with messages not arriving in a timely fashion or at all. After some packet captures we determined that messages sit in the server's "Send-Q". When a message doesn't arrive I'll look at "netstat -ct" output and see something like this:
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 0 1756 ubuntu:ircd 10.8.1.7:63602 ESTABLISHED
Sometimes if I wait for a couple of minutes, the Send-Q will go to 0 and the message will be delivered, other times the client times out. My question is, why doesn't it just deliver the messages? What causes them to sit in Send-Q so long?
sshd also exhibits similar behavior, my ssh sessions freeze sometimes they come back, sometimes they time out.
Background
Not sure if the infrastructure here could be related to the issue, so here's what it looks like: these clients are on Windows 7 connecting with OpenVPN. OpenVPN server is on PFSense, the IRC server is on a local (NAT'd) LAN connected to PFSense. I have a firewall rule in place to allow clients to talk to 6667 on the server.
Investigating...
Latency/loss - looks decent enough. Not the best link ever but I would think this would be fine for IRC and SSH. Here is a ping from my client to the server, this is while my IRC and SSH are intermittantly hanging:
Ping statistics for 10.8.5.2:
Packets: Sent = 4478, Received = 4460, Lost = 18 (0% loss)
Approximate round trip times in milli-seconds: Minimum = 17.2 ms, Maximum = 273.4 ms, Average = 32.3 ms
MSS/MTU issues - MTU appears to be fine. OpenVPN mtu-test on my client says:
Thu Dec 03 12:41:21 2015 NOTE: Empirical MTU test completed [Tried,Actual] local->remote=[1589,1589] remote->local=[1589,1589]
...and here's my manual test:
> ping -f -l 1472 10.8.5.2
Pinging 10.8.5.2 with 1472 bytes of data:
Reply from 10.8.5.2: bytes=1472 time=23ms TTL=63
> ping -f -l 1473 10.8.5.2
Pinging 10.8.5.2 with 1473 bytes of data:
Packet needs to be fragmented but DF set.
Bandwidth/throughput - did some iperf tests to make sure there wasn't a throughput issue. Again, looks decent enough:
iperf -c 10.8.5.2
------------------------------------------------------------
Client connecting to 10.8.5.2, TCP port 5001
TCP window size: 63.0 KByte (default)
------------------------------------------------------------
[ 3] local 10.8.0.23 port 18587 connected with 10.8.5.2 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-10.0 sec 26.0 MBytes 21.8 Mbits/sec
Thanks, any help understanding "Send-Q" or more specific ideas about this issue would be much appreciated. Let me know if I can provide any more info here.
Update
Found out that I actually had massive packet loss. Pings from client->VPN didn't show this, but it was very apparent when using fping from VPN->client. I noticed it was only the Windows clients, and reinstalling the newest OpenVPN client seems to have fixed the loss. It might have been related to the OpenVPN TAP adapter being installed via disk imaging. Installing it manually per-machine seems to fix the problem.