I am currently thinking about migrating some of our servers and apps to a coreOS environment. One of the problems I see here is the management of persistent data as coreOS does not handle Docker volumes when moving a container to a new machine. After some research I found glusterFS which claims to be a cluster file system that could solve all my problems.
My current idea is this: I have a glusterFS container which runs as a privileged container on each of my coreOS machines and exposes a storage, /mnt/gluster
, for example. In my Dockerfile
s I specify that all my volumes should be mounted on this path.
The next thing I considered was which containers should obtain their own volumes and which ones should be sharing one. For example, every mysql
container would get its own volume as it is able to handle replication by itself. I don't want to mess around with that. Webservers serving the same website would properly use the same volume for stuff like "user uploaded images", etc. as they are not able to replicate those data.
Has anybody tried something like this or is there anything I have missed?
We have deployed a similiar setup with Atomic (http://www.projectatomic.io/) instead of CoreOS to a replicated non-distributed GlusterFS storage system with three replica-2 sets. This works very well.
However, you need to keep a few special characteristics of GlusterFS in mind. Like Brian already mentioned, Gluster places consistency and reliability above all. The more frequent changes happen, the more replication is happening. This puts a lot, and I mean A LOT, of pressure on your system.
Take care that your IO subsystem is fast (duh, it's storage), connect your Gluster nodes with the fastest network connections available. If you have only GBit, aggregate! Last but not least, the storage system must sport serious computation power, Gluster does a lot of computations to check its state. That being said, even under high load, Gluster delivers.
Reconsider your MySQL strategy. Gluster does the replication for you and also provides sort-of load-balancing in delivery. It might actually be faster to use Gluster.
The use of glusterfs would depend on the storage backend that you are using. As a cluster file system it is intended to cluster physical storage so it appears as one large continuous volume. This official quick start guide has a good explanation of the process.
In the event that your setup utilizes two or more separate backend storage servers or something similar to store all of the docker volumes, then using glusterfs or some other similar parallel file system may offer significant performance advantages. If this is the case you could also consider using Lustre, which is widely used as a parallel filesystem in the HPC community.
With that being said, tuning, debugging and configuring parallel/cluster filesystems can a time consuming task which requires a lot of expertise, patience and sometimes a willingness to restart from the beginning. It would be prudent to make sure that the performance benefits a parallel file system offer are worth the amount of effort required to setup and maintain it.