We recently had some 500's from nginx itself that somehow were not logged (we have screenshots, but nothing in the logs). That is weird in itself, because usually errors show up there. Regardless, I am wondering if there is something like a connection pool size that if maxed out would result in a 500? We have correlated it potentially to a recent spike in traffic, but it is not conclusive.
Anyone have any ideas of how to begin to approach such an issue?
We use a combination of log formats in nginx and lmon to catch things like this. An NGINX log format like:
Will capture a lot of helpful diagnostic info, like the upstream server that handled the request, as well as putting the status in the front so it is easy to read even if the logs are scrolling by pretty fast.
We use LMON to watch these logs and then alert us (pagers/email) if it sees errors, like 500s, 503s, 400s, in the logs:
http://www.bsdconsulting.no/tools/lmon-README
This can help you be alerted to an issue when its happening which is the easiest time to debug it.
The other thing you should probably consider if you haven't already is that by default nginx considers a 500 to be a fatal condition and doesn't try another upstream. If you have multiple upstreams you can configure it to use another one if it gets a 500, hopefully obscuring the failure from the user:
http://wiki.nginx.org/NginxHttpProxyModule#proxy_next_upstream
error_log $filename debug;
will turn on debug level logging into the error log -- this will give you lots and lots of details of nginx's internal status at the time of the error, and if compiled with --with-debug (which several distros do by default) it'll give even more.Be warned that the "debug" level really does generate lots of output, to the point that you may want to watch your disk space...
In my case the conf file was not named correctly (was example.com instead of example.com.conf) and was not included. Somehow this did not result in 'Welcome to nginx' but in an not-logged HTTP 500 error. Well, it was logged actually, but in the error file from a different virtual host which could not work with that particular url.