So, we may have some sort of zainy leak in our software. We're using Mono and spawning many processes over the course of weeks/months. Eventually, we can't spawn anymore on our clients machines. It usually take ~20 hours for things to stop. Closing and re-opening our application fixes it.
When spawning fails, There are fewer than 500 total processes running, and fewer than 1000 file handles over the entire system. ulimits on files are set to high-ish levels, process limit is around 8K I believe. We're running CentOS 6.2.
If we are leaking these PIDs or handles and somehow the standard ps
and lsof
commands just aren't showing them (neither is /proc), I need a way to tap into the Kernel or something else to see what the current values are that these limits test against.
Once we confirm that's the problem, I get the fun task of trying to decipher what's causing it... But that's for another day.
This application works on many many many other Linux machines with no problems as far as we know (other clients haven't reported this problem to us).
Any ideas how I could find the value of the metrics that ulimit sets bounds for? I'm desperately hoping I don't have to write C programs myself, but I'm not above doing it if necessary.
0 Answers