I have a server system running on Mac OS X 10.4 (Darwin Kernel Version 8.10.1). This server is used mainly as a Bugzilla Server, but there are some other web-based services running (Testlink, TikiWiki).
The Bugzilla database has about 60000 bugs in it, and there are about 300 active users on the system.
Bugzilla is on version 3.0, running on Perl 5.8.6, Apache 1.3.33 with mySQL 5.0.38
From time to time, we have serious trouble that Bugzilla throws a database error:
Software error:
Can't connect to the database.
Error: Too many connections
I already have several leads to possible solutions to this problem, but I wanted to bring up a more general question how you would debug these types of problems?
Right now we have set up the following to monitor the mySQL database:
- A cron job which dumps the full mysql processlist every 5 minutes
- Enabled log-slow-queries in my.cnf to log queries that take longer than 15 seconds
We just started gathering this data to see if we can find a reason for the "Too many connections" problem.
Are there any other things you can think of to monitor a mySQL database and to help diagnose the root cause of the problem?
There are two distinct plans of attack to follow through with when diagnosing these types of errors:
Firstly, there is the potential that it is an issue relating to the actual software in use: something is essentially sucking up connections and not releasing them back (either at all in terms of hanging thread, or in a reasonable amount of time in terms of a slow query).
The slow query log is very beneficial in diagnosing problems, but your value of 15 seconds is next to useless: if a query is taking 15 seconds then you're pretty screwed full stop. As a general rule of thumb, I look for queries taking more than one or two seconds to run. Work through whatever shows up in this log using the EXPLAIN keyword and look at what is causing the slowdown (bad joins, sorting needing a temp table etc.) - some clever magic with query caching and indexes can often help out if it's not possible to go deeper and tinker with code / database design.
Also, don't overlook the general query log in mysql. Whilst you won't want to leave it turned on (for long) on a production server, it can quickly tell you if, rather than a single query taking an age, a particular function in the software is hammering the database with hundreds of small queries. Obviously the only way to solve these sort of issues is via refactoring the code.
Secondly, you need to investigate if the configuration of the software is to blame. How many concurrent connections are you experiencing? What is the actual number of max connections set to in mysql. It may be something as simple as apache is serving say 100 concurrent requests while mysql is only configured to accept 20 connections - obviously something is going to give. If you can gauge how much traffic you are expecting to handle, then it just takes some common sense (and occasionally a dash of Google to find the correct setting) to balance all the components.
How many apache workers do you have ? How what are the max mysql connections you have allowed ? As apache spawns a cgi process per httpd worker when handling the request then f the former is larger than the latter apache can open more connections than mysql will allow.
I would suggest the following logging settings
A cron job for dumping is very useful, but just in case you don't having anything ready to actually graph the thing, I can recommend munin which has MySQL plugins for monitoring
which can quite helpful to determine spikes. I runs in five minutes interval by default.
Using it over the last year I've discovered quite interesting situation with it before which would have passed completely unnoticed otherwise.