I've given the task to create two CentOS 7 servers where not only the databases will be replicated but also files. Now my problem is that there will be probably hundred-thousands of files if not a million of files with a wide variety of sizes from a few Kbytes to ~1 Gbyte.
I've read about
- incrion
- lysncd
- git-annex
- ChironFS
Now I wish to ask your experiences about any of these if you have been using it or currently using it. How is the performance doing with the file changes regarding to copies and deletions? I'm very affraid of using any rsync because my experience is that it is not very fast with a lot of small files, therefore I can't really use it for a real-time file replication. Or am I wrong? Please prove me wrong. :)
Or maybe I'll need a 3rd and 4th server as fileservers? If yes, then the question still remains: How to replicate the files between the two servers in realtime?
Cheers!
If your servers are on the same LAN, then a clustered filesystem (ie: GlusterFS) or a shared storage solution (ie: via NFS) should be the better choice.
If your servers are in different location, having only WAN connectivity, the above solution will not work well. In this case, and if you only need one-way replication (ie: from active to backup server),
lsyncd
is a good solution. Another solution iscsync2
. Finally, another possibility is to useDRBD + DRBD Proxy
(please note that its proxy component is a commercial plugin).Finally, if your servers only have WAN connectivity and you need bidirectional replication (ie: both servers are active at the same time), basically no silver bullet exists. I'll list some possibilities, but I am far from recommending a similar setup:
unison
with its real-time pluginpsync
, which I exactly wrote for solving a similar problem (but please note that it has its own share of idiosyncrasies, and I provide no support for it)syncthing
with its real-time plugin (but it has significan limitations, namely it does not preserve ACLs nor the file's owner/group)I use the ZFS filesystem and leverage its block-level replication using the zfs send/receive framework.
I use a handy script called syncoid to perform regular synchronization of filesystems on intervals from 15 seconds to hourly or daily, depending on the requirement.
Block-level replication is going to be cleaner and more accurate than rsync for the dataset you speak of.
From my experience, distributed file systems provide easy replication mechanisms for applications. However, they suffer from bad performance especially when directories become very large with too many small files. This is expected as they need to deal with locking / shared access from multiple locations / machines.
Rsync-like ways provide, in some cases, acceptable replication with some delay. They don't affect application performance while reading / writing replicated folder.
I think a better solution is to provide shared storage (when affordable) accessible from one server. Another standby server is ready to mount shared folder when the first one goes down. There is no need to replicate any data between servers.
Cheers for the ideas. I've checked and tested them all out and I'm sticking to lsyncd.
Reasons: