Ping a Specific Port

Question

nbv4

Asked: 2010-04-22 09:03:38 +0800 CST2010-04-22 09:03:38 +0800 CST 2010-04-22 09:03:38 +0800 CST

Apache server completely freezes until it gets restarted

772

My server does this every few days. What sucks is that it always seems to do this right after I go to bed, so when I wake up, I'm greeted with the fact that my server has been down for the past 6 or 7 hours.

When I first noticed this, I added a cronjob that tries to restart the server every 15 minutes, but I guess that didn't fix it. Once I noticed the server was down, I can this command:

/etc/init.d/apache2 restart
* Restarting web server apache2
apache2: Could not reliably determine the server's fully qualified domain name, using 127.0.0.1 for ServerName
... waiting ...........................................................apache2: Could not reliably determine the server's fully qualified domain name, using 127.0.0.1 for ServerName
httpd (pid 17597) already running

...which is odd, because a restart should restart the server, even if it's already running, correct? I eventually had to "stop" then "start" to get it working again.

I then looked through the logs, and found something very weird. It seems that around the time the server crashed, the logs have entries that are wildly out of order. It looks a little like this:

xx.xxx.xxx.x - - [21/Apr/2010:06:32:05 -0400] "GET / blah"
xx.xxx.xxx.x - - [21/Apr/2010:06:51:25 -0400] "GET / blah"
x.xx.xxx.xxx - - [21/Apr/2010:06:38:23 -0400] "GET / blah"
xxx.xx.xx.xx - - [21/Apr/2010:06:31:56 -0400] "GET / blah"
xxx.xx.xx.xx - - [21/Apr/2010:06:51:49 -0400] "GET / blah"
xx.xx.xxx.xx - - [21/Apr/2010:06:33:20 -0400] "GET / blah"

I don't think the problem is memory, because this:

tells me that right before the crash, memory usage is fine.

I'm running apache with the worker mpm, here are the settings for that:

<IfModule mpm_worker_module>
  StartServers            1
  MaxClients            100
  MinSpareThreads         5
  MaxSpareThreads        10
  ThreadsPerChild        10
  MaxRequestsPerChild  3000
</IfModule>

This apache server is running a bunch of stuff, but most of the traffic comes from a django project I'm hosting, that uses mod_wsgi. There also is a simple machines forum that is running off of mod_fcgid. Those setting are below:

<IfModule mod_fcgid.c>
  MaxRequestsPerProcess 500
  MaxProcessCount 3

  AddHandler fcgid-script .php .fcgi
  AddHandler cgi-script .cgi .pl
  FCGIWrapper "/usr/bin/php-cgi" .php 
</IfModule>

Anyone know of anything else I can check? I've just about tweaked every single setting I can think of, yet these freezes still happen.

Edit: I have both a postgres and mysql server running on this machine, but they both work during this freeze, because my backup script ran during that 5 hour time frame, and it worked perfectly fine.

Edit2: I'm running Ubuntu Server 9.10. When the server is down, all requests just never return. The page hangs. No error messages or anything.

5 Answers

Voted

Graham Dumpleton · Answer 1 · 2010-04-22T17:55:23+08:00

Best Answer

Graham Dumpleton

2010-04-22T17:55:23+08:002010-04-22T17:55:23+08:00

You don't say anything how you are using mod_wsgi and have it configured. I would suggest as a start to read 'http://code.google.com/p/modwsgi/wiki/ApplicationIssues#Python_Simplified_GIL_State_API'. You possibly are using a C extension module for Python which doesn't implement full threading properly. If you use daemon mode of mod_wsgi though, such deadlocks should be detected and processes at least forcibly restarted after a period. So, if you are using embedded mode, which is discouraged, then use daemon mode instead as a start.

Overall, this sort of issue, if you believe it is related to mod_wsgi should be discussed on the mod_wsgi mailing list. Debugging stuff like this on StackOverflow/ServerFault/SuperUser is really hard.

3

voretaq7 · Answer 2 · 2010-04-22T09:21:08+08:00

voretaq7

2010-04-22T09:21:08+08:002010-04-22T09:21:08+08:00

Well, it appears something is causing your web server to get a metric ass-ton of requests -- If you look in your apache error log you'll probably see that you're hitting your MaxClients limit (which is why your site falls over).

Find and eliminate the source of the request storm and your problem will go away (if you're lucky it's all from one source and you can just block them at your firewall).

Alternatively you can crank MaxClients up to some insane value, but that will probably just upset the rest of your system.

2

Dan Andreatta · Answer 3 · 2010-04-22T11:06:45+08:00

Dan Andreatta

2010-04-22T11:06:45+08:002010-04-22T11:06:45+08:00

I would guess it is one of the modules, or it could be some interaction between the modules. My first suspect would be mod_wsgi, especially since you are using it with MPM worker. It should be safe, according to the developers, but it still creates a python interpreter per process, and the python interpreter is not exactly thread-friendly. Try switch your django application to fastcgi. Or try run apache with MPM prefork.

Then you could try switching from mod_fcgid to mod_fastcgi, and/or try disable other modules you may have enabled.

0

Paul · Answer 4 · 2010-04-22T11:24:31+08:00

Paul

2010-04-22T11:24:31+08:002010-04-22T11:24:31+08:00

Can you post what you have in error_log (can be found in /var/log/httpd/error_log) when the problem happens?
Also, I would like to see parts from /var/log/messages from the same time.
And, post the output of df -h (disk usage).

0

jerclarke · Answer 5 · 2011-02-04T16:36:27+08:00

jerclarke

2011-02-04T16:36:27+08:002011-02-04T16:36:27+08:00

Your problem could be any number of things, but since it's clear you're not already the first thing you need to do is install Monit or some similar software. Monit is a daemon that runs on your server and, as long as the OS is running, makes regular checks that applications you define are running. You can tell it to check that Apache is available and if it's not restart apache. You can also tell it to restart apache depending on system variables like high load or full ram. Once you have that set up you can at least know that your site won't go down when this happens, and Monit will email you when it takes action, so you'll have an easy log of when the problem occurs to compare with logs etc.

http://mmonit.com/monit/

0

Apache server completely freezes until it gets restarted

Ping a Specific Port

How do I tell Git for Windows where to find my private RSA key?

How do you restart php-fpm?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Resolve host name from IP address

How can I sort du -h output by size

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?