We have serious stability issues with mysqld running on Linux hosts in EC2, with all of its data and log files stored on an EBS volume. We keep a slave purely for hot backup and failover, and when the master goes down, we can usually bring up the slave as a master without any issues, and then create a new slave.
But it's very problematic that our master will just go down. The master host keeps running fine, but mysqld won't respond to anything, and can't even be killed with kill -9.
This happens in both our production and staging environments, which are similar, but production runs on large instances (with Centos 5.2 x86_64) and staging on medium instances (with Centos 5.2 i686).
Has anybody experienced similar mysqld stability problems in EC2, and if so, how did they deal with them?
Thanks in advance.
If mysqld won't die even with a kill -9, then the problem is almost certainly that it's in uninterruptible sleep waiting for disk IO. This strongly suggests that you've got a dud EBS, which happens sometimes. If you're feeling excessively optimistic, you can try contacting Amazon support, but the quickest solution is just to create a new EBS and use that (hopefully you'll be on a less-crap storage unit) or try moving to a different availability zone. Yes, they're bollocks options, but EC2 just glitches up like that sometimes and you're effectively screwed.
Agreed. We have some long running ec2mysql instances and have had no issues. It sounds like a hardware issue specific to your environment.
Try to connect as root (ie the mysql root user, not your normal root user). It's possible that there are too many connections to mysql, which prevents new connections. The mysql root account is except from these restrictions and can always connect.