I would like to understand what is the best solution for realtime replication between two ZFS on Linux (ZoL) boxes connected by a 10 GbE link. The goal is to use them for virtual machines; only one box at a time will run the virtual machines and the ZFS filesystem itself. Snapshot need to be possible on the first (active) box. I plan to use enterprise/nearline grade SATA disks, so dual-port SAS disks are out of question.
I thought at the following possibilities:
- use iSCSI to export the remote disks and make a mirror between the local box's ZFS disks and the remote iSCSI disks. The bigger appeal of this solution is its simplicity, as it uses ZFS own mirroring. On the other side, ZFS will not give priority to the local disks over the remote ones, and that can cause some performance degradation (barely relevant on a 10 GbE network, I suppose). Moreover, and cause of bigger concern, is how ZFS will behave in case of network link loss between the two boxes. Will it re-sync the array when the remote machine become available, or manual intervention will be required?
- use DRBD to synchronize two ZVOLS and lay ZFS on top of the DRBD device. In other words, I'm speaking about a stacked ZVOL + DRBD + ZFS solution. This seems the preferred approach to me, as DRBD 8.4 is very stable and proven. However, many I/O layers are at a play here and performance may suffer.
- use plain ZFS + GlusterFS on top. From ZFS standpoint, this is the simpler/better solution, as all replication traffic is delegated to GlusterFS. Do you found GlusterFS stable enough?
What do you feel is the better approach? Thanks.
I recommend a clustered dual-node shared SAS setup or continuous asynchronous replication on 15 or 30-second intervals. The latter is good for continuity, while the latter provides a way to obtain geographic separation. They can be used together.
However, if you want to experiment, you can use Infiniband SRP or 100GbE RDMA to create a ZFS mirror between your two nodes.
For example, node1 and node2, each have local disk (assume hardware RAID) and present that local storage over SRP. One node is in control of the zpool at a time, and that pool is comprised of node1's local disks and node2's remote disk.
Your mirroring is synchronous because it's a ZFS mirror. Failover and consistency is handled by normal resilvering behavior. Zpool import/ownership/export is handled by Pacemaker and the standard cluster utilities...
Or you can use a commercial solution that does the same. See:
http://www.zeta.systems/blog/2016/10/11/High-Availability-Storage-On-Dell-PowerEdge-&-HP-ProLiant/