Ubuntu Server 10.04.1 x86
I've got a machine with a FCGI HTTP service behind nginx, that serves a lot of small HTTP requests to a lot of different clients. (About 230 requests per second in the peak hours, average response size with headers is 650 bytes, several millions of different clients per day.)
As a result, I have a lot of sockets, hanging in TIME_WAIT (graph is captured with TCP settings below):
I'd like to reduce the number of sockets.
What can I do besides this?
$ cat /proc/sys/net/ipv4/tcp_fin_timeout 1 $ cat /proc/sys/net/ipv4/tcp_tw_recycle 1 $ cat /proc/sys/net/ipv4/tcp_tw_reuse 1
Update: some details on the actual service layout on the machine:
client -----TCP-socket--> nginx (load balancer reverse proxy) -----TCP-socket--> nginx (worker) --domain-socket--> fcgi-software --single-persistent-TCP-socket--> Redis --single-persistent-TCP-socket--> MySQL (other machine)
I probably should switch load-balancer --> worker connection to domain sockets as well, but the issue about TIME_WAIT sockets would remain — I plan to add a second worker on a separate machine soon. Won't be able to use domain sockets in that case.
One thing you should do to start is to fix the
net.ipv4.tcp_fin_timeout=1
. That is way to low, you should probably not take that much lower than 30.Since this is behind nginx. Does that mean nginx is acting as a reverse proxy? If that is the case then your connections are 2x (one to client, one to your web servers). Do you know which end these sockets belong to?
Update:
fin_timeout is how long they stay in FIN-WAIT-2 (From
networking/ip-sysctl.txt
in the kernel documentation):I think you maybe just have to let Linux keep the TIME_WAIT socket number up against what looks like maybe 32k cap on them and this is where Linux recycles them. This 32k is alluded to in this link:
This link also suggests that the TIME_WAIT state is 60 seconds and can not be tuned via proc.
Random fun fact:
You can see the timers on the timewait with netstat for each socket with
netstat -on | grep TIME_WAIT | less
Reuse Vs Recycle:
These are kind of interesting, it reads like reuse enable the reuse of time_Wait sockets, and recycle puts it into TURBO mode:
I wouldn't recommend using net.ipv4.tcp_tw_recycle as it causes problems with NAT clients.
Maybe you might try not having both of those switched on and see what effect it has (Try one at a time and see how they work on their own)? I would use
netstat -n | grep TIME_WAIT | wc -l
for faster feedback than Munin.tcp_tw_reuse is relatively safe as it allows TIME_WAIT connections to be reused.
Also you could run more services listening on different ports behind your load-balancer if running out of ports is a problem.