I've been playing with gluster for the last 2 days and been asking questions here and on their questions system. I really don't understand some of the stuff. I see people saying stuff like
Set up replicated bricks between the servers (since you are only using 3, replicated would be safer), and each server will see the files of all other servers as being 'local' - even if one server fails, the files have been replicated to the other servers.
or
Gluster will maintain the file synchronization across volumes (bricks), and has 'self-healing' capabilities that will deal with any inconsistencies due to one server being offline.
Since I mount a remote volume from the server to the client(s) how does gluster handle failure of the server node, the one the volumes are mounted from? From what I've tried the folder on the client where the volume was mounted becomes inaccessible and I have to use umount to unblock it. And after that there no content from the server.
This is, basically what I don't see covered in any explanations: what happens when the server node fails and whether it's possible to really replicate the content, like unison or rsync does?
We recently started researching GlusterFS for our own usage so this question was interesting to me. Gluster uses what are called 'translators' on the FUSE client to handle how you store data. There are several types of translators which are outlined here:
http://www.gluster.com/community/documentation/index.php/GlusterFS_Translators_v1.3
The one you are asking about specifically is called the Automatic File Replication Translator or AFR, and is covered in detail here:
http://www.gluster.com/community/documentation/index.php/Understanding_AFR_Translator
Looking at the source code it appears that the data is actually written to nodes simultaneously, much better than rsync!
Regarding the recovery from a failure situation there is one interesting note I found. The Gluster system is different than Ceph in that it isn't actively aware of replication state changes and has to be 'triggered'. So if you lose a node in your cluster, you have to lookup each file in order for Gluster to make sure its replicated:
http://www.gluster.com/community/documentation/index.php/Gluster_3.2:_Triggering_Self-Heal_on_Replicate
I was unable to find a good page describing the failure scenario mechanisms internally, like how the client detects things are broken. However downloading the source code and looking through the client it appears there are various timeouts that it uses for commands and a probe it does every so often to other systems in the cluster. It looks like most of these have TODO marks and aren't currently configurable except through source code modification, which may be a concern for you if convergence time is critical.
With just 2 nodes replicating, gluster is not much different than an automatic rsync script. Things really only starts to be interesting once you have 4 or more storage nodes -- your client machines see a pool of space, but the constituent files are spread across all the storage nodes (bricks). This means that if your 4 servers have 10TB of local space, your client machines can see a single namespace of 20TB (replicated, or 40TB of unprotected storage).
I've seen a brief hiccup -- maybe 30 seconds or so -- on a client machine when it tries IO after a storage brick becomes unavailable. After the hiccup, though, IO will continue normally as long as there are servers online that still hold a full set of the volume data.
You're describing behavior that's unexpected - I would consult #gluster on irc.freenode.net or [email protected] or http://community.gluster.org/
-John Mark Gluster Community Guy
When the client facing server fails (i.e. the server whose IP/DNS was used by client to mount the filesystem) then the entire volume becomes offline to that client i.e. it can't read/write on the volume.
However if the client mounted it using IP/DNS of other server then the volume will still be online for that client. However the read/writes will not go to the failed/crashed instance.