I run an Apache2 server which uses the Shibboleth daemon (shibd) as federated authentication module. Certain server connections using Shibboleth seem to stick permanently in CLOSE_WAIT state.
tcp 38 0 blah.blah:57346 shib.server.:8443 CLOSE_WAIT
tcp 38 0 blah.blah:45601 shib.server2:8443 CLOSE_WAIT
tcp 38 0 blah.blah:41737 shib.server3:5057 CLOSE_WAIT
From what I can find out, CLOSE_WAIT means that when the remote server disconnects, the local application is failing to close the connection, as it should. I suspect shibd is responsible somehow.
Needless to say, if enough CLOSE_WAIT connections accumulate, I have a problem.
Trying to get rid of the CLOSE_WAIT connections by simply using
/etc/init.d/networking restart
does not work. In fact networking seems to refuse to close down and restart, and I get a SIOCADDRT: File exists error (ie networking is trying to start without having stopped first). Same problem with ifup -a
So I have two questions - one may be easy, and one harder.
- What's a good way to force networking to restart, and force whatever connections are stuck in CLOSE_WAIT to clear?
- Any ideas about how to fix shibboleth and force shibd module to behave?
The answer to 1, unfortunately, is to restart the process that still has references to the connections. Nothing else will force it to
close
them.Eight years, seven months and a different Stack Exchange account later, and shibd (now in a new version) still has this behaviour.
The best, but entirely cludgy, way around the problem is to use a crontab about once a day to run
In the past this was itself a headache, as large metadata files meant shibd took many minutes to reload. The current version of shibd allows for 'as needed' loading of metadata from a remote host, which means reloads are now less problematic.