I need to implement some sort of HA setup where 2 servers need to be able to always be in sync, no matter on which one you write on. The DB part can be covered by a master-master replication setup. However, when it comes to files and content I haven't been able to find something to suit these needs very well. I need to be able to replicate /var/www for example from one machine to the other and be able to write in any of them and always have the same content available no matter where the http request goes to.
- unison: easy to use, simple concept, however it's more like a 2-way rsync, there is no automatic propagation of file changes, unless you run it with its repeat option. I'm not sure how reliable that is. I was hoping for a daemon-like feature that would 'watch' changes to a folder's content.
- glusterfs: easy to configure, a nice project, seems to be perfect for what I need, however, it doesn't seem to handle this 2-way.
- xtreemfs: difficult to configure if you want replication (the docs are a bit hard to follow) and seems to be intended for the "distributed filesystem" part more than for the replication aspect.
- ceph: similar to gluster, but again, don't think it handles 2-way replication.
- mogilefs: not transparent, the application that you build needs to be aware of it and use it's services to access the file system. Not something I can use.
So I'm not sure if either 2-way replication is something that is usually not done and I need to reconsider this or I haven't researched this enough, but I'm at a loss. I don't seem to find other solutions.
Is there anything else out-there that can handle automatic transparent 2-way file replication?
How about DRBD with its dual-primary mode and a cluster file system such as OCFS2 or GFS?
Might work surprisingly well especially if your directory tree is not containing a huge amount (say, millions or more) of frequently changing small files.
If you need fully automated replication at the filesystem level then either you need a shared filesystem (e.g. GFS) and some way of accessing it from both machines (DRBD, SAN, mutli-host SCSI, iSCSI). But apart from the hardware based solutions, IME, it's not a very robust solution.
Previously I'd used rsync for major updates and a notification/polling system for interim updates for a similar setup - but this used a lot of smarts within the application code. I've also used AFS (although not for this kind of set up) and found it to be robust and scalable; you might want to take a look at it.
I've managed to accomplish this with gluster using this kind of setup: http://www.howtoforge.com/high-availability-storage-cluster-with-glusterfs-on-ubuntu
The key is to NOT mount volumes remotely, but locally, and let gluster replicate the folder behind the volumes on both machines, this way any machine can go down at any point.
Ceph ought to work, that is the point of a distributed file system.
Do note that the more you have multiple hosts writing to the same directory, the worse performance will be (because they have to be very careful about syncing), the converse is living with some degree of propagation delay.
I could be wrong, and certainly test, I would think that either Gluster or Ceph would work, do note that you should assume you will need your backups at some point with Ceph (no fsck for btrfs yet).
Run some tests, see how it works for you. Do make sure you are checking with stuff like md5 sums and pull the power cord type recovery.