I would like to have a highly-available MySQL system, with automatic failover, running on Amazon EC2 instances.
The standard approach to solving this is problem Heartbeat + DRBD, but I've found a lot of posts suggesting DRBD doesn't work on EC2, though none saying exactly why. Obviously, a serial heartbeat or distinct network is out of the question in the virtualised environment. It would also be good to have the different servers be in different availability zones, but we're getting into a much harder problem there.
What are peoples' opinion on having a high uptime solution in "the cloud"?
Note: This question was asked before RDS with multi-AZ was announced, which is the nice automatic answer for today's modern IT professional. :)
I think you really want a multi-zone RDS setup which was recently added to AWS.
Read more here: http://aws.typepad.com/aws/2010/05/amazon-rds-multi-az-deployment.html
If you wouldn't ask about AWS, I'd suggest a setup including DRBD. This would make sure that both servers stay in sync all the time. But I'm almost 100% sure this isn't possible yet on AWS.
Generally, I'd be careful about snapshotting and all that - it's not a silver bullet! It takes a good while on AWS. The instance storage itself is a) not fast at all and b) not persistent! Even with EBS it's not really fast and you still need to stop the i/o for a consistent snapshot.
Cheap easy option - install mysql on different datacenters in EC2 yourself and setup master/master replication between them. Point your front end webservers in each datacenter at those replicated mysql servers. Setup automated DNS Failover between the front end webservers in each location if a content health check fails on your primary site - it will automatically redirect client traffic to the replicated site in the other data-center - unitl you fix the primary site and the health checks start passing again - then traffic will automatically fallback to the primary site. I do this all time - even between different vendors i.e. EC2 and Linode. It works great and client traffic failover happens in less than 1 minute. You can get automated DNS Failover from dnshat.com for cheap.
I'd default to active/passive dual master replication using a floating VIP. (Heartbeat, OpenAIS, MMRM, or Pacemaker)
I can't think of a reason why this isn't a good idea. Can you?
MMRM
The do-it-yourself option would be to install MySQL to an EBS volume, use an elastic IP or dynamic DNS to switch which server you're pointing at on fail.
You'll need an external server monitoring the heartbeat, which would then unmount the EBS volume, remount to your backup server, then either remap the IP or change the DNS. If you're worried about the filesystem itself, then you'll have to do lvm snapshotting or something to get copies of your data, and then you can back those up to S3 or an EBS volume as well.
I like having the data on the EBS volume itself because you can grab EBS snapshots of it for backup without getting involved with the lvm stuff if that sounds scary to you.
Also to note, Amazon has a Enterprise MySQL package which I haven't used, but is probably a better option. Their prices are usually pretty reasonable for support contracts.