I have a public facing application server that accesses an Oracle database that resides behind a firewall.
Some of the processes that are executed through the application server can run for over an hour. This causes the firewall to disconnect the session without any error messages being generated. This causes the process to appear hung.
I've set the KeepAliveInterval to 1 second and KeepAliveTime to two hours in the registry.
I've set the enable=broken option in my tnsnames.ora file
The listener has the sqlnet expire time set to two minutes, so it's sending keepalives from that end as well.
The problem is, it doesn't seem to be working. On a different machine, I at least got a message that the firewall had disconnected the app-server (that's when I discovered the enable=broken requirement)
How do I make sure that the keep-alives are being sent?
Why does the firewall disconnect the session? Is the firewall set to close connections that have been active too long? Or to close connections that have been inactive too long? If the firewall is set to disallow connections that persist for more than an hour, there probably isn't anything you can do in the network configuration-- you would need to change the configuration of the firewall.
From an architecture standpoint, do you really need to run hour-long processes from your application server? Normally, it would make more sense for the application server to make a request that causes a database process to run asynchronously (i.e. spawn a job using
DBMS_SCHEDULER
orDBMS_JOB
) and then have the application server monitor the progress of the job. That way, you don't have a connection that is open continuously for an hour. And you can give the user some sort of progress indicator (particularly if the job is instrumented using, say,DBMS_APPLICATION_INFO
).I suggest that 5 seconds and 30 minutes might be more sensible values (and that's assuming that there is an RTT time under 5 msec). If the firewall has decided to evict the connection from its state table, the keepaliveinterval is irrelevant.
With a packet capture on both sides of the firewall
Really the first place you start looking to investigate this problem is in the firewall logs.