For testing purpose, I installed four GlusterFS nodes and created a replicated volume with a replicate count of 4.
Two Gluster nodes reside in DC A, while the other Gluster nodes reside in DC B.
Now let's assume the four Gluster nodes have the following host names:
- gluster01 - located in DC A
- gluster02 - located in DC A
- gluster03 - located in DC B
- gluster04 - located in DC B
In my test scenario, I write a file to gluster01 which is located in DC A. Now how smart do the other Gluster nodes in DC B replicate from gluster01? Will both Gluster nodes in DC B replicate from gluster01 or will only one Gluster node from DC B replicate while the second node in DC B replicated from the other in DC B?
The reason why I am asking this question is that I want to avoid non-necessary replication traffic between my DCs.
I found some hints in the official Gluster documentation; however, the stuff I found didn't clear things up for me.
A gluster client issues writes to all AFR replica servers simultaneously. (http://blog.gluster.org/2010/06/video-how-gluster-automatic-file-replication-works/). This fellow at redhat has written something he calls "new style replication" which implements something like what you are thinking happens, but this is not implemented in the glusterfs git repo. While NSR is interesting, the part relevant to this answer is his discussion of what AFR does now.
The easiest way to conserve bandwidth between your two sites is to use geo-replication, but this may not be useful in your application (eg, you need a multi-master setup).