I'm having intermittent occurrences of mysql causing high load on the machine, making the web app unavailable. I'm using monit to monitor it, but it doesn't catch the issue, apparently because it can still connect to mysql. Here's my monit script for mysql:
check process mysqld with pidfile /var/run/mysqld/mysqld.pid
group database
start program = "/etc/init.d/mysql start"
stop program = "/etc/init.d/mysql stop"
if failed port 3306 protocol mysql then restart
if failed unixsocket /var/run/mysqld/mysqld.sock protocol mysql then restart
if 5 restarts within 5 cycles then timeout
When the issue occurs load on the machine is high with mysql taking almost all of the cpu. You can still "log in" to mysql using the mysql
command line tool, but any selects/updates don't respond.
What should I be using to catch when this issue pops up?
Check the processlist via the MySQL client. (
show full processlist;
) From that point, you can isolate where the query is running and whether optimization is necessary or if it should be stopped.From that point, you can
kill $NUMBER;
to terminate the problem connection as opposed to restarting MySQL.Restarting something that is already running due to undesired operation should be a last resort and is generally a bad idea. Especially with databases, as you risk your data.
Certainly, specific situations dictate different actions. For example, if you know there is a runaway memory leak in particular software, there are no operations running, and the only way to restore the resources is to restart: then do so.
Additionally, if you are losing a lot of money every minute, a restart may be reasonable. For example, if you see no path for quick restoration and you think a restart will restore service, a restart will make sense if the risk to the data or application will be less money than you are actively losing. This principle applies but may vary slightly depending on your industry, service, and SLA.
It could be locked tables or an overloaded server. Try slow query logging, as well as Warner's suggestion of processlist (tip: mytop will do this in a handy interface). Also try (regular) top to see what's competing with the database for CPU.
If you see high "load" (run queue) without processes eating (much) CPU power it could be storage related. (insufficient IOPS/throughput)