My java application can sometimes be killed by an external script. This can be done either with SIGTERM or with SIGKILL.
The application is a server which receives many connections per second, and it can be killed while trying to serve them.
I would like to restart the application whenever it's killed, so I have prepared a script for that purpose.
The problem is that, once the app has been killed, the new application instance can't bind to the port used by the previous instance, because the "Address is already in use". The previous instance's process has been definitely terminated, anyway the offending listening port is still there, but it is assigned to bash (or sh on other machines).
Obviouly, my goal is to restart the application and let it bind successfully to the previous address.
I've tried waiting more than 200 seconds before restarting to no avail, anyway I can't afford to wait that much.
I've encountered this problem on all the machines I've ran the application (which is a jetty server with java 1.6).
Any suggestion is appreciated, thanks,
Silvio
EDIT Killing the jvm process is not the normal way I exit my application, this is used in case of problems (OutOfMemoryErrors) only. And I never need to kill it with SIGKILL, because SIGTERM always suffices, I resort to SIGKILL only in case SIGTERM fails, which has never happened. I'm working on a long term solution, meanwhile I have to keep my app running by applying stitches here and there.
EDIT To be more clear: this is the netstat -tunap | grep line I see before killing the process:
tcp6 0 0 :::8898 :::* LISTEN 22709/java
and this is after killing the process
tcp6 0 0 :::8898 :::* LISTEN 23665/sh
notice that the process with PID 22709 is killed and gone, but the port is still there (but locked by sh)
UPDATE after I kill my application, with netstat I can see a long list of pending connections in CLOSE_WAIT state, with my ip as destination. Also, I can see a sh process in state LISTEN listening on my port: when I kill it, a sleep process replaces it and listens on the same port: When I finally kill this sleep process, the port is released and I can restart successfully my server. That could be a solution to get my port released, but I fear that automatically killing processes in order to release a port is a bit risky
The server still expects some packets from the clients after the listening sockets are closed and keeps the port assigned. The application may use SO_REUSEADDR socket option to allow immediate socket address reuse.
Here is an excerpt from my Linux ip(7) manual page:
The application or application server might have a configuration setting for using this socket option.
Your not actually killing your java application, your actually killing your java virtual machine (jvm) instance which is in turn running your java application.
This is not the idea way of terminating your java process.
if your having to kill your jvm with kill -9 , the jvm wont be able clear up after itself thus leaving operating resources in limbo. :-(
Add some functionality to your app to make it exit gracefully. If you have no choice, then try to kill you jvm with -15 , it may help it clear up after itself.
If your java program really is hanging the jvm, then you need to get a debugger and squash those pests.
Killing a process and restarting it is a hack, but's not fix. You should only use SIGKILL if a process is not responding any other method.
I usually try
kill -15
then only kill -9 as a last resort.
and for fun...
http://www.youtube.com/watch?v=Fow7iUaKrq4
Since you only do this manually, you may have to add another check.
and kill the pid associated with your open socket, even if it is bash or sh.
Also, you mentioned that most of the time SIGTERM works. If that's the case, your app should catch the SIGTERM and jump into some graceful exit code that RSTs all open connections and then closes the socket.
HTH
If you have access to the source code, you need to create the socket with the
SO_REUSEADDR
option mentioned by Jacek. Also of interest are thetcp_tw_recycle
andtcp_tw_reuse
kernel flags (on Linux).The real problem is in the protocol design, which you may or may not be able to change. Interesting threads on the topic:
With your update I have another explanation. The sh process keeping the socket open must be a child of your application, forked after the listening socket has been opened. It didn't die with its parent and was adopted by the init process.
You should try to find out what is that shell process for (probably some script started by your application) and why it is not terminating. Maybe it will be enough to fix the script so it terminates after finishing it job will be enough? Or there is a way to make it not detach from the parent (it should die with the parent if is a part of the same process group) or make it to close all the unneeded file descriptors inherited from the parent.
You may try:
to see what other files it keeps opened. One of these will be, most probably, the shell script. Knowing what it is we may find a way to fix the problem.