For the moment there's a lot of choices for setting up a Linux cluster.
For cluster manager: you can use Red Hat Cluster manager, Pacemaker or Veritas Cluster Server. The first one has the most momentum, the second one comes by default with RH subscriptions and the last one is very expensive and has a very good reputation ;-)
For storage: - You can replicate LUN's using software raid / md device - You can use the network using DRBD replication, which offers a bit more flexibility - You can use Veritas Storage Foundation technology to talk to your SANs replication technology.
Anyone has any recommandations or experience with these technologies?
I'd go with GlusterFS. The latest version 3.x supports geo-replication (long latent pipe type of thing) as well as LAN replication. There's plenty of docs about how to replicate and spread data across the cluster.
I don't like DRDB, because there's a limit on the number of nodes you can use. I think GlusterFS on decent hardware, with a decent bit of network tuning might be just what you're after. Definitely worth a test session.
I am currently testing "stretch cluster" using Red Hat Cluster Suite and DRBD. I am typing this at a hotel near Red Hat Summit in Boston which just ended. I talked with the Red Hat CLuster Suite developers and they said stretch clusters were not supported at this time.
This won't stop me from working on it for fun though. My set up is four HP blades in a single cluster. Two blades are in one datacenter about 15 miles from the other datacenter which houses the other two blades. In order to get the cluster to even join together, I needed the network team to configure the routers between the sites to pass multicast traffic. In addition, since Red Hat hard codes a TTL of "1" to the multicast heartbeat packets, I had to use iptables to mangle that multicast address to a higher TTL.
After that was done, I was able to get a four node cluster with my blades. For storage, I have a 3Par LUN shared at each site between each of it's two local nodes. These are the block devices I use for my DRBD devices. I should add here that I have a dedicated 1G WAN link for just my DRBD traffic. I was able to get DRBD running fairly easily between the sites and use that DRBD device as a PV in a clustered LV with GFS2 running on it. I do occasionally have split-brain conditions on my DRBD setup that I must manually recover from and I am trying to isolate that problem.
The next step has been the hardest. I want to be able to fail over my GFS2 mount to the other node in case the primary fails. My GFS2 service consists of a floating IP -> DRBD -> LVM -> GFS2. The drbd.sh script that comes in the source code for clustering doesn't work at all so I have been testing with the regular DRBD startup script in /etc/init.d. Seems to work "sometimes" so I will need to tweak that it seems.
I ws dismayed to discover that none of this is supported in Red Hat Cluster Suite, so any dream I had of moving this to production is dashed. And where else would you need this kind of set up? Pretty much only very important production stuff.
I did talk with Symantec here and they told me they absolutely support active-active stretch clusters with shared storage. I will believe that when I actually see it though.
DRBD is dead slow as everybody knows. You can't use that for high load enterprise purposes. It uses 128 KiB hashing functions which limit the IO requests to max. 128 KiB instead of 512 KiB what a regular HDD can provide. Furthermore, there is a stupid IO request size detection. This thing only works when connected to the other host. If you loose the connection this is reset to 4 KiB on your local HDDs. 8.4.1 and 8.3.11 have the same issues.
Here are some more details: http://www.gossamer-threads.com/lists/drbd/users/24104
This is why real enterprises use $$$ stuff like Veritas.
MD RAID 1 is much better if you need performance at a low price. It also provides a "write-mostly" mode so that you can avoid reading from a slow device.
If you've got a SAN backend then a shared storage filesystem (GFS?) makes a lot more sense than replicated storage.
We use DRBD at work. It works pretty well, but we only use it in a two node configuration. I wouldn't really consider it for anything more complicated.
Wrt. software raid/md, while DRBD superficially is just RAID 1 over the network, in reality DRBD is significantly more complicated in order to deal e.g. with temporary network partitions without having to resync from scratch, and so forth.
Also, consider that software RAID-1 typically tries to balance the load on the drives by distributing reads somewhat evenly over them. Needless to say, this isn't a very good idea if one drive is local and the other is somewhere behind a potentially low bandwidth/high latency network link.
IOW, software RAID is not a good solution for replication.
Metro/stretch cluster can only be used in asynchronous or semi-synchronous replication mode so that excludes
md
.I have worked with Veritas Volume Manager, Cluster and Global cluster in a $$$ company - I really liked it.
I`ve worked with host-based mirroring of SAN-devices.
I have a couple of XEN-clusters running DRBD with local disks for replication between two data-centers (not too far away from each other). I just ran into some troubles last friday after short network disconnects there...
What I really loved about the Veritas solution is that can fine-tune every aspect. So for a read-intensive db-application we tuned the volumes so that reads came from the primary data center colocated with the clients - that gave an enormous performance boost.
So for storage-replication: If you can afford it - go for Veritas.
Now for the cluster-software: I known Veritas, Sun, AIX/HACMP/HAGEO, HP-Serviceguard and Linux-Heartbeat.
I liked Veritas best and especially like the way it prevents split-brains (jeopardy mode)...
But you can achieve the same on any other cluster-software, if you use independent lines for heartbeats - so invest in these lines - instead of the software.
I may cite Alan Robertson here: "A cluster is not a cluster unless you tested it."
And I saw more downtimes BECAUSE of a complex cluster-setup than savings through such a setup. So keep it simple (Heartbat v1 instead of v2).