I am currently re-architecting our backup solution and would like some input on how to do the replication portion. We have two sites, one much larger than the other, that we need to back up. I have two Linux servers that will share out their volumes via either NFS/iSCSI/SMB (I haven't fully decided which to go with yet). The files that will be stored on these volumes will most likely be VMDK - so just single, large files - which will contain backups using either VMWare Data Recovery or Veeam (also haven't decided which to use).
Now comes the tricky part.. I would like to replicate these VMDK's to both servers. So server A will have it's own VMDK, and server B will have it's own VMDK. Should I use something like Rsync to periodically replicate the files themselves to the other server, or would it be better to utilize something like DRBD+GFS2 to replicate stuff as it changes at the block level, essentially giving me an active/active clustered file system? Keep in mind that the VMDK's will not be modified in both locations. That is, server A will never modify a VMDK that is primarily housed on server B and vice-versa
Please let me know if you need any more information and thanks for any input!
I've worked on a couple of similar systems in the past (for VM storage replication, no less!). In general I'm more comfortable with the scheduled-rsyncs solution as we found DRBD setup to be tricky and a little brittle, but you do have the disadvantage that if a failure happens betweens rsyncs you lose the changes. How frequently do these files change / how recent does a backup have to be in case of failure?
DRBD is better in the sense that the block devices are being updated continuously, which is a big plus and allowed faster VM-failover for us. But we did find the setup (and debugging if something went wrong) more difficult. I'd generally say use DRBD if you need that kind of redundancy, or rsync if you consider these more cold/infrequent backups.
(We also tried some fairly wild stuff: iSCSI (actually, SRP)-exported block devices from two different servers, with software RAID applied to the block devices on server #3. But we didn't keep that around long enough to test it much.)