We run a cluster of Java Spring server apps on AWS EC2 instances running Centos 7. We have health monitors on them, and occasionally an alarm will go off and we'll find that the Java process has quietly just disappeared. We can find nothing in any of the logs...either our own, or system logs. We have an outer "catch Throwable" around our own code that logs what it catches, but we run Tomcat, which has may of its own threads. We've added extra logging to try to capture the moment when it disappears, but so far, that has yielded no information.
I've looked over this question: How to find out why a Java process died without a trace in Linux. I see nothing helpful there.
We currently can't involve the launcher of these processes in a solution. It's a long story. Trust me that we've tried to go down that road.
Any suggestions? I'm wondering if maybe I should wrap the Java process in an outer parent process that carefully monitors and logs all signals from the Java child process. I'm wondering if there's such an off-the-shelf solution that I haven't found yet. Any ideas would be greatly appreciated.
0 Answers