The problem doesn't occur very often, but still it surely exists and I'm not sure where to start from. I have grepped for the mongrel PIDs in /var/log/ and the only messages that contained them are these:
Jun 7 07:46:24 staging kernel: 4gb seg fixup, process mongrel_rails (pid 29498), cs:ip 73:00937a5c
It has something to do with Xen specific version of libc
, but it's not critical, and the processes are still running with these messages accumulating in kern.log
I'm actually looking not only for specific solution (which probably couldn't be provided from the above description) but for any advice on how to set up monitoring or investigate such cases.
We use nagios to monitor our mongrels (along with hundreds of other services).
It just checks to ensure that there are mongrel processes running on each of the required ports. If not, it re-starts them.
I had these messages when libc6-xen was not installed in the xen domU. So verify you have that package installed...
When another variant of the libc is used, it will still work but it will be slower as the kernel has to catch the bad operation and do the right thing instead. The message quoted is generated by the kernel precisely in that situation.
So you guessed it right, that doesn't explain why mongrel stops. Check mongrel's documentation to enable debugging logs if it has any. Otherwise you can always try to strace the process until it fails... the end of the log will give you hints on how it fails, and maybe you'll find why.
You might look at god for monitoring and managing your mongrels. It's quite flexible and you can use it to restart based on certain thresholds such as amount of memory, CPU usage, flapping and more. You might also consider monit, which I know of people using to replace god.
Not exactly an answer, but must you use mongrel? I switched to apache + passenger and never looked back.