Are there any major alternatives for automatic failover on Linux besides the typical Heartbeat/Pacemaker/CoroSync combinations? In particular, I'm setting up failover on EC2 instances, which only supports unicast - no multicast or broadcast. I'm specifically trying to handle the few pieces of software we have which don't already have automatic failover and don't support multi-master environments. This includes tools like HAProxy and Solr.
I have Heartbeat+Pacemaker working, but I'm not thrilled with it. Here are some of my issues:
- Heartbeat - By itself, limited to two nodes. I'd like to have 3+.
- Pacemaker - Impossible to configure automatically. Cluster has to be running with a quorum and then it still requires manual configuration.
- CoroSync - Does not support unicast.
Pacemaker works very well, although it's power makes it difficult to setup. The real problem with Pacemaker is that there is no easy way to automate the configuration. I really want to launch an EC2 instance, install Chef/Puppet and have the entire cluster launch without my intervention.
I prefer to use keepalived for high-availability. I find it simpler to setup (one daemon and config) than heartbeat and company. The only drawback I run into, is that keepalived doesn't have a unicast option by default, and only uses VRRP for communication (The author of HAProxy has written a unicast patch for keepalived however)
I am actually working on something very similar to what you described (a fail-over cluster on EC2), and after trying out Heartbeat, settled on Corosync as my messaging layer. Corosync will run on multiple servers and it does support Unicast (UDPU) as of version 1.3.0 (from Nov, 2010). I have setup and tested Corosync on Amazon's EC2 cloud (using Amazon's Linux AMI) and can confirm it works without issue.
A sample udpu file is installed to /etc/corosync.
Add one member block to the interface section for each node, and specify the transport as updu. (I have used the same port as heartbeat in the example below, but you can change it as desired).
e.g.:
(Heartbeat is supposed to support 3+ node clusters in versions 1.2.3+, although, I have never tried it personally, and don't know if it would work with Unicast).
Sorry, but the part about Pacemaker is not true. The Pacemaker regression and release tests make extensive use of automation.
To configure without an active cluster, prefix all commands with
CIB_file=/var/lib/heartbeat/crm/cib.xml
or set it in your environment. Just be sure you remove the .sig file before starting the cluster.For clusters without quorum, most if not all tools should support
-f
or--force
which will instruct the cluster to accept the change anyway. If you find a tool that does not - please file a bug.In the open source world, there's RedHat Cluster Suite. It's been several years since I've implemented RHCS so I don't have many relevant things to say about it today.
Commercially, there is Veritas Cluster Server. No experience with it.
A much simpler and open source HA tool is UCARP. UCARP doesn't provide nearly the same kind of "infrastructure" that Heartbeat/Pacemaker/CoroSync does but you can build HA solutions around it.
You can also build highly available infrastructure with virtualization technologies but these solutions tend to focus on host-level availability as opposed to application level availability.
There is Oracle Clusterware for Oracle Unbreakable Linux, though I've not used it.
If you are already using EC2, why not use Elastic Load Balancing ? It will let you achieve application level availability without having to configure failover yourself.
Veritas Cluster is great (compared to Linux-Heartbeat, AIX-hacmp, HP-Serviceguard and Sun cluster), but it costs lots of money. The last time I did look at it its price was based on cpu-cores of the cluster. Current Vendor ist Symantec...
opensvc (https://www.opensvc.com) support multiple heartbeat drivers :
and also have quorum mecanisms in case of split brain.
I managed to automatically setup a 4 nodes cluster made of 2 google cloud instances + 2 amazon instances with terraform + ansible.
I wrote a failover cluster manager in posix shell: https://github.com/nackstein/back-to-work
take a look at it, I'm looking for someone that want to try it and help in development.