I have two machines running as reverse proxy caches/load balancers in front of a site I admin. Recently at peak times I've been seeing a problem where the first few packets from any stream of packets (ICMP, UDP, TCP, whatever) arrives on the machine but is not responded to.
Here's the symptom from the point of view of someone pinging the machines:
PING X.X.X.X (X.X.X.X): 56 data bytes
Request timeout for icmp_seq 0
Request timeout for icmp_seq 1
Request timeout for icmp_seq 2
Request timeout for icmp_seq 3
Request timeout for icmp_seq 4
Request timeout for icmp_seq 5
Request timeout for icmp_seq 6
Request timeout for icmp_seq 7
Request timeout for icmp_seq 8
Request timeout for icmp_seq 9
Request timeout for icmp_seq 10
64 bytes from X.X.X.X: icmp_seq=11 ttl=62 time=47.515 ms
64 bytes from X.X.X.X: icmp_seq=12 ttl=62 time=46.108 ms
64 bytes from X.X.X.X: icmp_seq=13 ttl=62 time=48.893 ms
64 bytes from X.X.X.X: icmp_seq=14 ttl=62 time=47.466 ms
64 bytes from X.X.X.X: icmp_seq=15 ttl=62 time=49.679 ms
64 bytes from X.X.X.X: icmp_seq=16 ttl=62 time=50.011 ms
64 bytes from X.X.X.X: icmp_seq=17 ttl=62 time=49.324 ms
64 bytes from X.X.X.X: icmp_seq=18 ttl=62 time=48.989 ms
64 bytes from X.X.X.X: icmp_seq=19 ttl=62 time=51.003 ms
64 bytes from X.X.X.X: icmp_seq=20 ttl=62 time=48.612 ms
Here's a view of a HTTP session via tcpdump on the affected machine (X.X.X.X -> affected machine, C.C.C.C -> client making request):
21:46:27.105396 IP C.C.C.C.62425 > X.X.X.X.80: Flags [S], seq 139436485, win 65535, options [mss 1380,nop,wscale 3,nop,nop,TS val 398010008 ecr 0,sackOK,eol], length 0
21:46:28.032300 IP C.C.C.C.62425 > X.X.X.X.80: Flags [S], seq 139436485, win 65535, options [mss 1380,nop,wscale 3,nop,nop,TS val 398010017 ecr 0,sackOK,eol], length 0
21:46:28.032337 IP X.X.X.X.80 > C.C.C.C.62425: Flags [S.], seq 1108838018, ack 139436486, win 65535, options [mss 1380,nop,wscale 9,sackOK,TS val 1918451162 ecr 398010017], length 0
21:46:28.064417 IP C.C.C.C.62425 > X.X.X.X.80: Flags [.], ack 1, win 65535, options [nop,nop,TS val 398010018 ecr 1918451162], length 0
21:46:28.064438 IP C.C.C.C.62425 > X.X.X.X.80: Flags [P.], ack 1, win 65535, options [nop,nop,TS val 398010018 ecr 1918451162], length 160
21:46:28.165372 IP X.X.X.X.80 > C.C.C.C.62425: Flags [.], ack 161, win 128, options [nop,nop,TS val 1918451296 ecr 398010018], length 0
21:46:28.165933 IP X.X.X.X.80 > C.C.C.C.62425: Flags [.], ack 161, win 128, options [nop,nop,TS val 1918451296 ecr 398010018], length 1368
21:46:28.219978 IP C.C.C.C.62425 > X.X.X.X.80: Flags [.], ack 1369, win 65535, options [nop,nop,TS val 398010019 ecr 1918451296], length 0
21:46:28.220001 IP X.X.X.X.80 > C.C.C.C.62425: Flags [.], ack 161, win 128, options [nop,nop,TS val 1918451350 ecr 398010019], length 1368
21:46:28.220011 IP X.X.X.X.80 > C.C.C.C.62425: Flags [.], ack 161, win 128, options [nop,nop,TS val 1918451350 ecr 398010019], length 1368
21:46:28.288178 IP C.C.C.C.62425 > X.X.X.X.80: Flags [.], ack 4105, win 65493, options [nop,nop,TS val 398010020 ecr 1918451350], length 0
21:46:28.288196 IP X.X.X.X.80 > C.C.C.C.62425: Flags [.], ack 161, win 128, options [nop,nop,TS val 1918451418 ecr 398010020], length 1368
21:46:28.288203 IP X.X.X.X.80 > C.C.C.C.62425: Flags [.], ack 161, win 128, options [nop,nop,TS val 1918451418 ecr 398010020], length 1368
21:46:28.288210 IP X.X.X.X.80 > C.C.C.C.62425: Flags [.], ack 161, win 128, options [nop,nop,TS val 1918451418 ecr 398010020], length 1368
21:46:28.288217 IP X.X.X.X.80 > C.C.C.C.62425: Flags [.], ack 161, win 128, options [nop,nop,TS val 1918451418 ecr 398010020], length 1368
21:46:28.333968 IP C.C.C.C.62425 > X.X.X.X.80: Flags [.], ack 6841, win 65493, options [nop,nop,TS val 398010020 ecr 1918451418], length 0
21:46:28.333986 IP X.X.X.X.80 > C.C.C.C.62425: Flags [.], ack 161, win 128, options [nop,nop,TS val 1918451464 ecr 398010020], length 1368
21:46:28.333994 IP X.X.X.X.80 > C.C.C.C.62425: Flags [.], ack 161, win 128, options [nop,nop,TS val 1918451464 ecr 398010020], length 1368
21:46:28.338939 IP C.C.C.C.62425 > X.X.X.X.80: Flags [.], ack 9577, win 65493, options [nop,nop,TS val 398010020 ecr 1918451418], length 0
21:46:28.338955 IP X.X.X.X.80 > C.C.C.C.62425: Flags [.], ack 161, win 128, options [nop,nop,TS val 1918451469 ecr 398010020], length 1368
21:46:28.338962 IP X.X.X.X.80 > C.C.C.C.62425: Flags [.], ack 161, win 128, options [nop,nop,TS val 1918451469 ecr 398010020], length 1368
21:46:28.349943 IP C.C.C.C.62425 > X.X.X.X.80: Flags [.], ack 12313, win 65535, options [nop,nop,TS val 398010021 ecr 1918451464], length 0
21:46:28.354190 IP C.C.C.C.62425 > X.X.X.X.80: Flags [.], ack 15049, win 65535, options [nop,nop,TS val 398010021 ecr 1918451469], length 0
21:46:28.354206 IP X.X.X.X.80 > C.C.C.C.62425: Flags [P.], ack 161, win 128, options [nop,nop,TS val 1918451484 ecr 398010021], length 8
21:46:28.393441 IP C.C.C.C.62425 > X.X.X.X.80: Flags [.], ack 15057, win 65535, options [nop,nop,TS val 398010021 ecr 1918451484], length 0
21:46:28.393452 IP C.C.C.C.62425 > X.X.X.X.80: Flags [F.], seq 161, ack 15057, win 65535, options [nop,nop,TS val 398010021 ecr 1918451484], length 0
21:46:28.393467 IP X.X.X.X.80 > C.C.C.C.62425: Flags [.], ack 162, win 128, options [nop,nop,TS val 1918451524 ecr 398010021], length 0
21:46:28.393481 IP X.X.X.X.80 > C.C.C.C.62425: Flags [F.], seq 15057, ack 162, win 128, options [nop,nop,TS val 1918451524 ecr 398010021], length 0
21:46:28.445126 IP C.C.C.C.62425 > X.X.X.X.80: Flags [.], ack 15057, win 65535, options [nop,nop,TS val 398010021 ecr 1918451524], length 0
21:46:28.445138 IP C.C.C.C.62425 > X.X.X.X.80: Flags [.], ack 15058, win 65535, options [nop,nop,TS val 398010021 ecr 1918451524], length 0
The two machines are in a CARP pool with one as active and one as standby. The hardware and configuration of the machines are identical. The problem only affects the active machine and is visible for traffic going to the dedicated machine IP addresses and to the CARPed floating IP. Swapping the active to standby and vice versa moves the problem, so I'm pretty sure it's not a hardware issue.
They're using pf as a firewall and to NAT traffic for the machines behind them.
They're running FreeBSD 8.0-RELEASE-p5. While their kernels are custom-built, that's only to add the necessary bits to use CARP. The kernel configs are as follows:
include GENERIC
ident LOADBALANCER
device pf
device pflog
device pfsync
device carp
The network card is an Intel 82574L using the em driver.
Any clues?
It turns out that the problem was with pf.
pf on FreeBSD by default limits the number of state table entries to 10,000. The adaptive timeouts were keeping up most of the time, but could not cope at peak times.