My server does this every few days. What sucks is that it always seems to do this right after I go to bed, so when I wake up, I'm greeted with the fact that my server has been down for the past 6 or 7 hours.
When I first noticed this, I added a cronjob that tries to restart the server every 15 minutes, but I guess that didn't fix it. Once I noticed the server was down, I can this command:
/etc/init.d/apache2 restart
* Restarting web server apache2
apache2: Could not reliably determine the server's fully qualified domain name, using 127.0.0.1 for ServerName
... waiting ...........................................................apache2: Could not reliably determine the server's fully qualified domain name, using 127.0.0.1 for ServerName
httpd (pid 17597) already running
...which is odd, because a restart should restart the server, even if it's already running, correct? I eventually had to "stop" then "start" to get it working again.
I then looked through the logs, and found something very weird. It seems that around the time the server crashed, the logs have entries that are wildly out of order. It looks a little like this:
xx.xxx.xxx.x - - [21/Apr/2010:06:32:05 -0400] "GET / blah"
xx.xxx.xxx.x - - [21/Apr/2010:06:51:25 -0400] "GET / blah"
x.xx.xxx.xxx - - [21/Apr/2010:06:38:23 -0400] "GET / blah"
xxx.xx.xx.xx - - [21/Apr/2010:06:31:56 -0400] "GET / blah"
xxx.xx.xx.xx - - [21/Apr/2010:06:51:49 -0400] "GET / blah"
xx.xx.xxx.xx - - [21/Apr/2010:06:33:20 -0400] "GET / blah"
I don't think the problem is memory, because this:
tells me that right before the crash, memory usage is fine.
I'm running apache with the worker mpm, here are the settings for that:
<IfModule mpm_worker_module>
StartServers 1
MaxClients 100
MinSpareThreads 5
MaxSpareThreads 10
ThreadsPerChild 10
MaxRequestsPerChild 3000
</IfModule>
This apache server is running a bunch of stuff, but most of the traffic comes from a django project I'm hosting, that uses mod_wsgi. There also is a simple machines forum that is running off of mod_fcgid. Those setting are below:
<IfModule mod_fcgid.c>
MaxRequestsPerProcess 500
MaxProcessCount 3
AddHandler fcgid-script .php .fcgi
AddHandler cgi-script .cgi .pl
FCGIWrapper "/usr/bin/php-cgi" .php
</IfModule>
Anyone know of anything else I can check? I've just about tweaked every single setting I can think of, yet these freezes still happen.
Edit: I have both a postgres and mysql server running on this machine, but they both work during this freeze, because my backup script ran during that 5 hour time frame, and it worked perfectly fine.
Edit2: I'm running Ubuntu Server 9.10. When the server is down, all requests just never return. The page hangs. No error messages or anything.
You don't say anything how you are using mod_wsgi and have it configured. I would suggest as a start to read 'http://code.google.com/p/modwsgi/wiki/ApplicationIssues#Python_Simplified_GIL_State_API'. You possibly are using a C extension module for Python which doesn't implement full threading properly. If you use daemon mode of mod_wsgi though, such deadlocks should be detected and processes at least forcibly restarted after a period. So, if you are using embedded mode, which is discouraged, then use daemon mode instead as a start.
Overall, this sort of issue, if you believe it is related to mod_wsgi should be discussed on the mod_wsgi mailing list. Debugging stuff like this on StackOverflow/ServerFault/SuperUser is really hard.
Well, it appears something is causing your web server to get a metric ass-ton of requests -- If you look in your apache error log you'll probably see that you're hitting your
MaxClients
limit (which is why your site falls over).Find and eliminate the source of the request storm and your problem will go away (if you're lucky it's all from one source and you can just block them at your firewall).
Alternatively you can crank
MaxClients
up to some insane value, but that will probably just upset the rest of your system.I would guess it is one of the modules, or it could be some interaction between the modules. My first suspect would be
mod_wsgi
, especially since you are using it with MPM worker. It should be safe, according to the developers, but it still creates a python interpreter per process, and the python interpreter is not exactly thread-friendly. Try switch your django application to fastcgi. Or try run apache with MPM prefork.Then you could try switching from
mod_fcgid
tomod_fastcgi
, and/or try disable other modules you may have enabled.Can you post what you have in error_log (can be found in /var/log/httpd/error_log) when the problem happens?
Also, I would like to see parts from /var/log/messages from the same time.
And, post the output of df -h (disk usage).
Your problem could be any number of things, but since it's clear you're not already the first thing you need to do is install Monit or some similar software. Monit is a daemon that runs on your server and, as long as the OS is running, makes regular checks that applications you define are running. You can tell it to check that Apache is available and if it's not restart apache. You can also tell it to restart apache depending on system variables like high load or full ram. Once you have that set up you can at least know that your site won't go down when this happens, and Monit will email you when it takes action, so you'll have an easy log of when the problem occurs to compare with logs etc.
http://mmonit.com/monit/